CICC Chemical Compound Mining Workflows Jungkee (Jake) Kim Community Grids Laboratory
A Workflow for Big Red Demo I PubMed Abstracts OSCAR3 SMILES Extraction Converting the format Text files XML files SMILES Molecular & Quantum Mechanics Converting to pictures Generating HTML script SDF files SDF files POV, JPG files “Big Red” is one of fastest supercomputers Mining chemical compounds found on research paper texts and showing them in 3D graphics 10/06/2006 CICC Project Meeting
A Workflow for Big Red Demo II Final HTML pages
A Workflow for Big Red Demo III PubMed abstracts 555,007 PubMed abstracts of 2005 – 2006 (part) R. Guha 1,000 abstracts per node distributed (Simple parallelism) 511 nodes X 1,000 input abstracts used for the demo OSCAR3 A Cambridge tool which extracts chemical information from text and produces an XML instance highlighting the chemical information Used a revised version for convenient batch processing (some incompatibility to ‘BigRed’ architecture) SMILES extraction Extracting SMILES elements from OSCAR’s XML output files Unique SMILES list within a batch 10/06/2006 CICC Project Meeting
A Workflow for Big Red Demo IV Generating 3D formats K. Gilbert Converting from SMILES to SDF format Molecular Mechanics program: “mengine” (MM engine) No Quantum Mechanics (QM) in the demo Converting 3D formats to pictures J. N. Huffman Persistence of Vision Raytracer (POV-Ray): converting SDF to POV Another program which converts the POV files to JPEG format Generating HTML script Showing those graphic files in an HTML page 10/06/2006 CICC Project Meeting
Bigger Picture for the Workflow NIH PubMed Database OSCAR Text Analysis Cluster Grouping Toxicity Filtering Docking Initial 3D Structure Calculation High Throughput Screening (HTS) Data Organization and Flagging Molecular Mechanics Calculations Quantum Mechanics Calculations NIH PubChem Database Big Red Demo IU’s Varuna Database POV-Ray Parallel Rendering