Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan Identification of Conclusive Association Entities by Biomedical Association Mining Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan
Outline Background The proposed technique: AMICAE Empirical evaluation Conclusion
Background
Conclusive Association Entities (CAEs) Biomedical entities Example: Genes, diseases, and chemicals Conclusive Association Entities (CAEs) in a scholarly article a Those biomedical entities that are specific targets on which conclusive findings about their associations are reported in a
Dopamine is not a specific target It is not a CAE Results on TDHL are not conclusive enough It is not a CAE
Even an entity appears in the title, it is not necessarily a CAE (not a specific target)
Why Identification of CAEs? Goal: To support analysis of conclusive findings on specific entities A task routine mission of biomedical scientists CTD (Comparative Toxicogenomics Database) GHR (Genetic Home Reference) OMIM (Online Mendelian Inheritance in Human) Challenge: It is quite difficult to Identify specific entities, and Estimate how conclusive the findings on entities are
Our Goal Developing a technique AMICAE (Association Mining for Identification of CAEs) Given: Title and abstract of a scholarly article a Output: An indicator to improve CAE identification by association mining
Related Work (1/4) Article indexing by biomedical terms Goal A kind of text classification task Example: Indexing of articles by MeSH (Medical Subject Heading) But we have a different goal Prioritizing CAEs that appear in an article Rather than classifying the article certain categories
Related Work (2/4) Prediction of new associations by existing associations Goal: Predicting new possible associations that deserve further analysis But we have a different goal Identification of CAEs that have already been published in articles
Related Work (3/4) Extraction of biomedical entity associations Goal: Extracting (recognizing) biomedical associations mentioned in articles But an entity in an association may not be a CAE The association may not be conclusive enough The association may not always exist, or The association may be only related to the background of the article
Related Work (4/4) Estimation of entity-article relatedness Typical techniques Ontology and statistical indicators that work on full-text articles We propose to improve them by association mining Association mining to refine entity-article relatedness estimation Applicable even if full texts are not available
The Proposed Technique: AMICAE
Main Ideas potential associations (identified from a set of articles) inferred associations Given an article a, if e1 and e3 are candidate entities in a, they are likely to be CAEs of a
Step 1/4 Given a candidate entity e in the title and the abstract of an article a, estimate its strength of being a potential CAE of a Top-2 entities in each article are potential CAEs of the article
Step 2/4 Given a set of articles, construct a network of potential associations based on the potential CAEs, and accordingly produce inferred (indirect) associations Se1,e2 = Number of articles having <e1, e2> as a potential association
Step 3/4 Estimate CorStrengthe,a (correlation strength between entity e and article a, based on association mining)
Step 4/4 Integrate CorStrength and other indicators by RankingSVM Various entity-article relatedness indicators are tested RankingSVM as a learning-based method to integrate CorStrength and these indicators We will see how CorStrength improves these indicators in CAE identification
Empirical Evaluation
The data Source of data: CTD (Comparative Toxicogenomics Database) An online database of associations between three types of entities (chemicals, genes, and diseases) Many biomedical scientists are recruited to frequently update the entity associations Only conclusive associations are curated About 50% of all articles in CTD (60,507 articles) with their CAEs appearing in their titles or abstracts
The Baselines Typical indicators to estimate relatedness between an entity e and an article a: Se(a) = Set of sentences (in a) mentioning e Sex(a) = Set of sentences where e and x co-occur
Evaluation Criteria MAP (Mean Average Precision) If CAEs of an article are ranked high, average precision (AP) for the article will be higher MAP is simply the average of the AP values for all articles
Average P@X If most CAEs of an article are ranked at top-X positions, P@X for the article will be high Average P@X is simply the average of the P@X values for all articles
Result When CorStrength is added (i.e., there are six indicators integrated by RankingSVM), the performance is further improved significantly
When CorStrength is added (i. e When CorStrength is added (i.e., there are six indicators integrated by RankingSVM), larger percentage of test articles (over 95%) can have their CAEs ranked at top positions (top-1 to top-3)
Conclusion
Identification of CAEs in the title and the abstract of an article is challenging It is difficult to identify those specific entities on which research findings reported in the article is conclusive enough We develop AMICAE that Provides helpful information to improve CAE identification by association mining Two candidate entities in an article are likely to be CAEs of the article if a strong association between them is mined from a collection of articles