Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Arnd Christian König Venkatesh Ganti Rares Vernica Microsoft Research Entity Categorization Over Large Document Collections.
Mining External Resources for Biomedical IE Why, How, What Malvina Nissim
Ke Liu1, Junqiu Wu2, Shengwen Peng1,Chengxiang Zhai3, Shanfeng Zhu1
TEMPLATE DESIGN © Identifying Noun Product Features that Imply Opinions Lei Zhang Bing Liu Department of Computer Science,
NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Curators’ Meeting Oct. 27, 2003 Clustering MeSH Representations of Medical Literature Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and.
Automating Discovery from Biomedical Texts Marti Hearst & Barbara Rosario UC Berkeley Agyinc Visit August 16, 2000.
Queensland University of Technology An Ontology-based Mining Approach for User Search Intent Discovery Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella, Abdulmohsen.
Non-invasive Techniques for Human Fatigue Monitoring Qiang Ji Dept. of Electrical, Computer, and Systems Engineering Rensselaer Polytechnic Institute
B IOMEDICAL T EXT M INING AND ITS A PPLICATION IN C ANCER R ESEARCH Henry Ikediego
Mining and Summarizing Customer Reviews
Knowledge Integration for Gene Target Selection Graciela Gonzalez, PhD Juan C. Uribe Contact:
Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
A Taxonomy of Evaluation Approaches in Software Engineering A. Chatzigeorgiou, T. Chaikalis, G. Paschalidou, N. Vesyropoulos, C. K. Georgiadis, E. Stiakakis.
 CiteGraph: A Citation Network System for MEDLINE Articles and Analysis Qing Zhang 1,2, Hong Yu 1,3 1 University of Massachusetts Medical School, Worcester,
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Survey of Medical Informatics CS 493 – Fall 2004 September 27, 2004.
Automatically Generating Gene Summaries from Biomedical Literature (To appear in Proceedings of PSB 2006) X. LING, J. JIANG, X. He, Q.~Z. MEI, C.~X. ZHAI,
1 Text Classification for Healthcare Information Support Rey-Long Liu ( 劉瑞瓏 ) Dept. of Medical Informatics Tzu Chi University, Taiwan.
Abstract Background: In this work, a candidate gene prioritization method is described, and based on protein-protein interaction network (PPIN) analysis.
Identifying Disease Diagnosis Factors by Proximity-based Mining of Medical Texts Rey-Long Liu *, Shu-Yu Tung, and Yun-Ling Lu * Dept. of Medical Informatics.
Reduction of Training Noises for Text Classifiers Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan.
Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13.
Mining Binary Constraints in Feature Models: A Classification-based Approach Yi Li.
Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Kevin Heinrich, Lai Wei, and Michael W. Berry University of Tennessee.
Retrieval of Highly Related Biomedical References by Key Passages of Citations Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan.
Generic Tasks by Ihab M. Amer Graduate Student Computer Science Dept. AUC, Cairo, Egypt.
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
Distribution of information in biomedical abstracts and full- text publications M. J. Schuemie et al. Dept. of Medical Informatics, Erasmus University.
Copyright OpenHelix. No use or reproduction without express written consent1.
Enhancing Text Classifiers to Identify Disease Aspect Information Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan.
1 A text-mining analysis of the human phenome Marc A van Driel 1, Jorn Bruggeman 2, Gert Vriend 1, Han G Brunner *,3 and Jack AM Leunissen 2 European Journal.
Ferran Sanz – GRIB (IMIM-UPF) Bioinformatics: How it can support the Family of International Classifications? Ferran Sanz Research Programme on Biomedical.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
1 Semantic Relations for Interpreting DNA Microarray Data and for Novel Hypotheses Generation Dimitar Hristovski, 1 PhD, Andrej Kastrin, 2 Borut Peterlin,
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Proximity-based Ranking of Biomedical Texts Rey-Long Liu * and Yi-Chih Huang * Dept. of Medical Informatics Tzu Chi University Taiwan.
Automatic Assignment of Biomedical Categories: Toward a Generic Approach Patrick Ruch University Hospitals of Geneva, Medical Informatics Service, Geneva.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Date: 2012/5/28 Source: Alexander Kotov. al(CIKM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Interactive Sense Feedback for Difficult Queries.
Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach Yih-Cheng Chang Department of Computer Science and Information.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Automatic Document Indexing in Large Medical Collections.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
Innovative Novartis Knowledge Center
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
Queensland University of Technology
Improving Health Question Classification by Word Location Weights
Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan
European VIRTA pilot – current situation
Wei Wei, PhD, Zhanglong Ji, PhD, Lucila Ohno-Machado, MD, PhD
Writing a Research Abstract
Title Goal Method Result
Bibliometric Analysis of Water Research
CICC Combines Grid Computing with Chemical Informatics
Review Key Teaching Points
Citation-based Extraction of Core Contents from Biomedical Articles
Data Mining for Finding Connections of Disease and Medical and Genomic Characteristics Vipin Kumar William Norris Professor and Head, Department of Computer.
Writing Careful Long Reports
Dynamic Category Profiling for Text Filtering and Classification
Analyzing and Organizing Information
Chapter Two: Review of the Literature
Incremental Context Mining for Adaptive Document Classification
Citation databases and social networks for researchers: measuring research impact and disseminating results - exercise Elisavet Koutzamani
Active AI Projects at WIPO
Presentation transcript:

Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan Identification of Conclusive Association Entities by Biomedical Association Mining Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan

Outline Background The proposed technique: AMICAE Empirical evaluation Conclusion

Background

Conclusive Association Entities (CAEs) Biomedical entities Example: Genes, diseases, and chemicals Conclusive Association Entities (CAEs) in a scholarly article a Those biomedical entities that are specific targets on which conclusive findings about their associations are reported in a

Dopamine is not a specific target  It is not a CAE Results on TDHL are not conclusive enough  It is not a CAE

Even an entity appears in the title, it is not necessarily a CAE (not a specific target)

Why Identification of CAEs? Goal: To support analysis of conclusive findings on specific entities A task routine mission of biomedical scientists CTD (Comparative Toxicogenomics Database) GHR (Genetic Home Reference) OMIM (Online Mendelian Inheritance in Human) Challenge: It is quite difficult to Identify specific entities, and Estimate how conclusive the findings on entities are

Our Goal Developing a technique AMICAE (Association Mining for Identification of CAEs) Given: Title and abstract of a scholarly article a Output: An indicator to improve CAE identification by association mining

Related Work (1/4) Article indexing by biomedical terms Goal A kind of text classification task Example: Indexing of articles by MeSH (Medical Subject Heading) But we have a different goal Prioritizing CAEs that appear in an article Rather than classifying the article certain categories

Related Work (2/4) Prediction of new associations by existing associations Goal: Predicting new possible associations that deserve further analysis But we have a different goal Identification of CAEs that have already been published in articles

Related Work (3/4) Extraction of biomedical entity associations Goal: Extracting (recognizing) biomedical associations mentioned in articles But an entity in an association may not be a CAE The association may not be conclusive enough The association may not always exist, or The association may be only related to the background of the article

Related Work (4/4) Estimation of entity-article relatedness Typical techniques Ontology and statistical indicators that work on full-text articles We propose to improve them by association mining Association mining to refine entity-article relatedness estimation Applicable even if full texts are not available

The Proposed Technique: AMICAE

Main Ideas potential associations (identified from a set of articles) inferred associations Given an article a, if e1 and e3 are candidate entities in a, they are likely to be CAEs of a

Step 1/4 Given a candidate entity e in the title and the abstract of an article a, estimate its strength of being a potential CAE of a Top-2 entities in each article are potential CAEs of the article

Step 2/4 Given a set of articles, construct a network of potential associations based on the potential CAEs, and accordingly produce inferred (indirect) associations Se1,e2 = Number of articles having <e1, e2> as a potential association

Step 3/4 Estimate CorStrengthe,a (correlation strength between entity e and article a, based on association mining)

Step 4/4 Integrate CorStrength and other indicators by RankingSVM Various entity-article relatedness indicators are tested RankingSVM as a learning-based method to integrate CorStrength and these indicators We will see how CorStrength improves these indicators in CAE identification

Empirical Evaluation

The data Source of data: CTD (Comparative Toxicogenomics Database) An online database of associations between three types of entities (chemicals, genes, and diseases) Many biomedical scientists are recruited to frequently update the entity associations Only conclusive associations are curated About 50% of all articles in CTD (60,507 articles) with their CAEs appearing in their titles or abstracts

The Baselines Typical indicators to estimate relatedness between an entity e and an article a: Se(a) = Set of sentences (in a) mentioning e Sex(a) = Set of sentences where e and x co-occur

Evaluation Criteria MAP (Mean Average Precision) If CAEs of an article are ranked high, average precision (AP) for the article will be higher MAP is simply the average of the AP values for all articles

Average P@X If most CAEs of an article are ranked at top-X positions, P@X for the article will be high Average P@X is simply the average of the P@X values for all articles

Result When CorStrength is added (i.e., there are six indicators integrated by RankingSVM), the performance is further improved significantly

When CorStrength is added (i. e When CorStrength is added (i.e., there are six indicators integrated by RankingSVM), larger percentage of test articles (over 95%) can have their CAEs ranked at top positions (top-1 to top-3)

Conclusion

Identification of CAEs in the title and the abstract of an article is challenging It is difficult to identify those specific entities on which research findings reported in the article is conclusive enough We develop AMICAE that Provides helpful information to improve CAE identification by association mining Two candidate entities in an article are likely to be CAEs of the article if a strong association between them is mined from a collection of articles