Download presentation
Presentation is loading. Please wait.
Published byPenelope Banks Modified over 8 years ago
1
Bio-Medical Text Mining with Python Jaganadh G Carlos Rodriguez-Penagos
2
Talk outline ➢ Introduction ➢ Python and Bio-informatics ➢ Text Mining Experiments with Python ➢ Bioreader ➢ NCBI NLP Services Python API ➢ Simple play with Medical ontology
3
introduction ➢ BioNLP or Bio Medical Text Mining is a recent research field on the edge of Natural Language Processing, bio-informatics, medical informatics and computational linguistics. ➢ applying text mining techniques to literature in biomedical and molecular biology domain ➢ Major ares of work ➢ Named Entity Recognition ➢ Information Retrieval and Extraction ➢ Medical ontology processing ➢ Text classification..........
4
Bioreader ● Bioreader is a python library developed by Carlos for teaching bio medical text processing ● Submitted to nltk_contrib ● Undergoing re-writing and enhancement
5
Bioreader... ● Bioreader is a module that allows creation of biomedical corpus based on keyword queries or PMID lists ● It also parses PUBMED and MEDLINE xml formats. ● Can be used to create bio-medical corpus on the fly for different nlp tasks
6
Python interface to ncbi-nlp services ● The NCIBI NLP web services provide programmatic access to parsed and tagged text from the National Library of Medicine's (NLM) PubMed literature database. ● Java and Perl based tools are available there to access the service ● Can access literature with pmid ● Provides an XML output
7
Python interface to ncbi-nlp services......... ● The interface can be accedes via http://nlp.ncibi.orghttp://nlp.ncibi.org ● Can be used to extract gene info from annotated biomedical literature
8
Demonstration ● Bioreader demonstration ● >>> from bioreader import getPmidsByTerm ● >>> br = getPmidsByTerm() ● >>> term = “blood cancer” ● >>> pmids = br.query(term) ● >>> from bioreader import CreateXML ● >>> xmlc = CreateXML() ● >>> xmlc.generateFile("absQAW.xml",list=pmid) ● >>> from bioreader import DataContainer ● >>> dc = DataContainer("absQAW.xml","pubmed") ● >>> dc.search("cancer","title")) # Enhance ● >>> dc.keys
9
NCBI-NLP service + Python demo >>> from pubmednlp import PubmedNlp >>> nlp = PubmedNlp() >>> nlp.getMetaData("17523140") >>> getAbstract(pmid='17523140')
10
Experiment with medical ontology Used the medline data + some semantic web programming techniques >>> from simplegraph import SimpleGraph >>> graph = SimpleGraph() >>> graph.load('SRSTRE2') >>> graph.value(None,"affects","Fish")
11
Future direction ● Interface to other bio-medical literature based web services ● Bio-medical ontology processing with python – Example and demonstration ● Mutation identification
12
future ➢ Enhanced search facility for bioreader ➢ Interface for Parsed MEDLINE data access http://www-tsujii.is.s.u-tokyo.ac.jp/enju-medline/ ➢ Bug fixes :-)
13
reference ● Python for Bioinformatics, Sebastian Basi,CRC Press. ● Bioinformatics Programming Using Python, Mitchell L. Model, O'RIELLY ● Java for Bioinformatics and Biomedical Applications, Harshavardhan Bal and Johny H,Springer.
14
Conclusion Questions? Suggestions? jaganadhg@gmail.com
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.