Faculty of Computer Science © 2006 CMPUT 605March 31, 2008 Towards Applying Text Mining and Natural Language Processing for Biomedical Ontology Acquisition.

Slides:



Advertisements
Similar presentations
Language Technologies Reality and Promise in AKT Yorick Wilks and Fabio Ciravegna Department of Computer Science, University of Sheffield.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Automatic Extraction and Incorporation of Purpose Data into PurposeNet P. Kiran Mayee Rajeev Sangal Soma Paul SCONLI3 JNU NEW DELHI.
Active subgroup mining for descriptive induction tasks Dragan Gamberger Rudjer Bošković Instute, Zagreb Zdenko Sonicki University of Zagreb.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
Faculty of Computer Science © 2006 CMPUT 605February 11, 2008 A Data Warehouse Architecture for Clinical Data Warehousing Tony R. Sahama and Peter R. Croll.
Faculty of Computer Science © 2006 CMPUT 605March 3, 2008 Concept-Based Electronic Health Records: Opportunities and Challenges S. Ebadollahi, S Chang,
1 Question Answering in Biomedicine Student: Andreea Tutos Id: Supervisor: Diego Molla.
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Automatic Classification of Accounting Literature Nineteenth Annual Strategic and Emerging Technologies Workshop Vasundhara Chakraborty, Victoria Chiu,
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Data Mining.
Text Classification: An Implementation Project Prerak Sanghvi Computer Science and Engineering Department State University of New York at Buffalo.
TextMOLE: Text Mining Operations Library and Environment Daniel B. Waegel and April Kontostathis, Ph.D. Ursinus College Collegeville PA.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Author : Jochen Dijrre, Peter Gerstl, Roland Seiffert Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Query Planning for Searching Inter- Dependent Deep-Web Databases Fan Wang 1, Gagan Agrawal 1, Ruoming Jin 2 1 Department of Computer.
Mining and Summarizing Customer Reviews
Intelligent Systems Lecture 23 Introduction to Intelligent Data Analysis (IDA). Example of system for Data Analyzing based on neural networks.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
TEACHING UNIVERSAL DECIMAL CLASSIFICATION (UDC) TO UNDERGRADUATE STUDENTS: A FOLKSONOMY DRIVEN APPROACH Tomislav Ivanjko* University of Zagreb Faculty.
University of Dublin Trinity College Localisation and Personalisation: Dynamic Retrieval & Adaptation of Multi-lingual Multimedia Content Prof Vincent.
Information Retrieval and its Application in Biomedicine Hong Yu 1,2, PhD Susan McRoy 1, PhD 1 Department of Computer Science 2 Department of Health Sciences.
Word Weighting based on User’s Browsing History Yutaka Matsuo National Institute of Advanced Industrial Science and Technology (JPN) Presenter: Junichiro.
Knowledge Representation and Indexing Using the Unified Medical Language System Kenneth Baclawski* Joseph “Jay” Cigna* Mieczyslaw M. Kokar* Peter Major.
Mining the Structure of User Activity using Cluster Stability Jeffrey Heer, Ed H. Chi Palo Alto Research Center, Inc – SIAM Web Analytics Workshop.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Use of Hierarchical Keywords for Easy Data Management on HUBzero HUBbub Conference 2013 September 6 th, 2013 Gaurav Nanda, Jonathan Tan, Peter Auyeung,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comparison of SOM Based Document Categorization Systems.
AI Week 14 Machine Learning: Introduction to Data Mining Lee McCluskey, room 3/10
A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.
Intelligent Techniques for Data Integration and Decision Support in the Medical Domain Mirjana Ivanović, Hans-Dieter Burkhard.
Web Services and Application of Multi-Agent Paradigm for DL Yueyu Fu & Javed Mostafa School of Library and Information Science Indiana University, Bloomington.
Pseudo-supervised Clustering for Text Documents Marco Maggini, Leonardo Rigutini, Marco Turchi Dipartimento di Ingegneria dell’Informazione Università.
BioSumm A novel summarizer oriented to biological information Elena Baralis, Alessandro Fiori, Lorenzo Montrucchio Politecnico di Torino Introduction text.
De-identification: A Critical Success Factor in Clinical and Population Research Steven Merahn MD Dee Lang, RHIT Prepared for 2007 APIII Pittsburgh, PA.
Handwritten Recognition with Neural Network Chatklaw Jareanpon, Olarik Surinta Mahasarakham University.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Artificial Intelligence Research Center Pereslavl-Zalessky, Russia Program Systems Institute, RAS.
Using Machine Learning Techniques in Stylometry Ramyaa, Congzhou He, Dr. Khaled Rasheed.
Date: 2015/11/19 Author: Reza Zafarani, Huan Liu Source: CIKM '15
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Automatic Discovery and Processing of EEG Cohorts from Clinical Records Mission: Enable comparative research by automatically uncovering clinical knowledge.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.
TRANS: T ransportation R esearch A nalysis using N LP Technique S Hyoungtae Cho, Melissa Egan, Ferhan Ture Final Presentation December 9, 2009.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Consumer Health Question Answering Systems Rohit Chandra Sourabh Singh
1 A latent information function to extend domain attributes to improve the accuracy of small-data-set forecasting Reporter : Zhao-Wei Luo Che-Jung Chang,Der-Chiang.
Multi-Modal Bayesian Embeddings for Learning Social Knowledge Graphs Zhilin Yang 12, Jie Tang 1, William W. Cohen 2 1 Tsinghua University 2 Carnegie Mellon.
School of Computer Science & Engineering
Web Services and Application of Multi-Agent Paradigm for DL
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Bashir Dodo Automated Layer Segmentation of Macula SD-OCT Images Using Graph-Cut Method Bashir I. Dodo, Yongmin Li, Khalid.
Natural Language Processing of Knee MRI Reports
What is Pattern Recognition?
Automatic Handwriting Generation
Presentation transcript:

Faculty of Computer Science © 2006 CMPUT 605March 31, 2008 Towards Applying Text Mining and Natural Language Processing for Biomedical Ontology Acquisition Inniss T., Light M., Thomas G., Lee J., Grassi M., Williams A. TMBIO(2006) Amit Satsangi

© 2006 Department of Computing Science CMPUT 605 Focus  Ontology for describing age-related macular degeneration (AMD)  Comparison of the accuracy of three methods for Ontology – Natural Language Processing (NLP) – Text Mining (SAS Text Miner) – Human Expert  Manual and adhoc knowledge acquisition  IDOCS (Intelligent Distributed Ontology Consensus System)

© 2006 Department of Computing Science CMPUT 605 Introduction  No existing common and standardized vocabulary for classification of disease types for certain eye- diseases  Clinicians, dispersed geographically, may use different terms to describe the same condition  Research aimed at extracting the feature and attribute descriptions for the vocabulary of AMD, and build an Ontology from that.

© 2006 Department of Computing Science CMPUT 605 Related Work  Lot of research done, since 1990’s, for applying NLP techniques in medicine, bio-medicine etc.  NLP & Text Data Mining have been recognized to play an important role in this endeavor  Research focused on online repositories such as Medline & PubMed  NLP systems developed: MedLee, UMLS, GENIES etc.

© 2006 Department of Computing Science CMPUT 605 IDOCS

© 2006 Department of Computing Science CMPUT 605 Methodology  Four clinical experts in retinal diseases enlisted to view 100 eye sample images of AMD  Experts in different geographic locations  Described the observations using digital voice recorders – no artificially imposed vocabulary constraints  Another retinal expert for manual parsing of the transcribed text – extracting key words, organization of key-words into categories etc.

© 2006 Department of Computing Science CMPUT 605 Results: Human Experts

© 2006 Department of Computing Science CMPUT 605 Methodology: NLP  NLP: Used for information extraction and automatic summarization.  Identify short sequences of words having meaning over and above a meaning composed directly from their parts – “extreme programming”  Ngram Statistics Package (NSP) used for collocation discovery in case of bi-grams  Word-pair associations measured by PMI

© 2006 Department of Computing Science CMPUT 605 Methodology: NLP  Large PMI for larger degree of association between the words

© 2006 Department of Computing Science CMPUT 605 Results: NLP

© 2006 Department of Computing Science CMPUT 605 Methodology:Text Mining (SAS Text Miner)  Collection of documents (corpus) used as input to any text mining algorithm  Corpus broken into tokens or terms (tokens in a particular language)  Term weighting Measures: Entropy, Inverse Document Frequency (IDF), Global Frequency (GF) - IDF, None (Global weight of 1) & Normal term wt.

© 2006 Department of Computing Science CMPUT 605 Results: Text Miner  Frequency wt. None  Term wt. Normal

© 2006 Department of Computing Science CMPUT 605 Common Terms  sss

© 2006 Department of Computing Science CMPUT 605 Comparison  Thus text mining is a viable and effective method for determining vocabulary to describe a particular disease  Text Mining found a lot of terms that NLP found  Human Expert is the best Ground Truth

© 2006 Department of Computing Science CMPUT 605 Ontology Generation

© 2006 Department of Computing Science CMPUT 605 Conclusion and Future Work  Human experts are the best, but they did miss some key descriptors  Text Mining and NLP can enhance the generation of feature generations, by preventing the above case  As a consequence more robust vocabulary can be generated  Extension – evaluate the effectiveness of the automated tools, text mining & NLP  Different weighting schemes to be tried in the future

© 2006 Department of Computing Science CMPUT 605 Thank You For Your Attention!