Bio-Medical Text Mining with Python Jaganadh G Carlos Rodriguez-Penagos
Talk outline ➢ Introduction ➢ Python and Bio-informatics ➢ Text Mining Experiments with Python ➢ Bioreader ➢ NCBI NLP Services Python API ➢ Simple play with Medical ontology
introduction ➢ BioNLP or Bio Medical Text Mining is a recent research field on the edge of Natural Language Processing, bio-informatics, medical informatics and computational linguistics. ➢ applying text mining techniques to literature in biomedical and molecular biology domain ➢ Major ares of work ➢ Named Entity Recognition ➢ Information Retrieval and Extraction ➢ Medical ontology processing ➢ Text classification
Bioreader ● Bioreader is a python library developed by Carlos for teaching bio medical text processing ● Submitted to nltk_contrib ● Undergoing re-writing and enhancement
Bioreader... ● Bioreader is a module that allows creation of biomedical corpus based on keyword queries or PMID lists ● It also parses PUBMED and MEDLINE xml formats. ● Can be used to create bio-medical corpus on the fly for different nlp tasks
Python interface to ncbi-nlp services ● The NCIBI NLP web services provide programmatic access to parsed and tagged text from the National Library of Medicine's (NLM) PubMed literature database. ● Java and Perl based tools are available there to access the service ● Can access literature with pmid ● Provides an XML output
Python interface to ncbi-nlp services ● The interface can be accedes via ● Can be used to extract gene info from annotated biomedical literature
Demonstration ● Bioreader demonstration ● >>> from bioreader import getPmidsByTerm ● >>> br = getPmidsByTerm() ● >>> term = “blood cancer” ● >>> pmids = br.query(term) ● >>> from bioreader import CreateXML ● >>> xmlc = CreateXML() ● >>> xmlc.generateFile("absQAW.xml",list=pmid) ● >>> from bioreader import DataContainer ● >>> dc = DataContainer("absQAW.xml","pubmed") ● >>> dc.search("cancer","title")) # Enhance ● >>> dc.keys
NCBI-NLP service + Python demo >>> from pubmednlp import PubmedNlp >>> nlp = PubmedNlp() >>> nlp.getMetaData(" ") >>> getAbstract(pmid=' ')
Experiment with medical ontology Used the medline data + some semantic web programming techniques >>> from simplegraph import SimpleGraph >>> graph = SimpleGraph() >>> graph.load('SRSTRE2') >>> graph.value(None,"affects","Fish")
Future direction ● Interface to other bio-medical literature based web services ● Bio-medical ontology processing with python – Example and demonstration ● Mutation identification
future ➢ Enhanced search facility for bioreader ➢ Interface for Parsed MEDLINE data access ➢ Bug fixes :-)
reference ● Python for Bioinformatics, Sebastian Basi,CRC Press. ● Bioinformatics Programming Using Python, Mitchell L. Model, O'RIELLY ● Java for Bioinformatics and Biomedical Applications, Harshavardhan Bal and Johny H,Springer.
Conclusion Questions? Suggestions?