Show & Tell Limsoon Wong KRDL Datamining: Turning Biological Data into Gold
Show & Tell Jonathan’s rules: Blue or Circle Jessica’s rules: All the rest What is Datamining? Whose block is this? Jonathan’s blocks Jessica’s blocks
Show & Tell What is Datamining? Question: Can you explain how?
Show & Tell What are the Benefits? To the patient: Better drug, better treatment To the pharma: Save time, save cost, make more $ To the scientist: Better science
Show & Tell The Datamining Process
Show & Tell Epitope Prediction TRAP-559AA MNHLGNVKYLVIVFLIFFDLFLVNGRDVQNNIVDEIKYSE EVCNDQVDLYLLMDCSGSIRRHNWVNHAVPLAMKLIQQLN LNDNAIHLYVNVFSNNAKEIIRLHSDASKNKEKALIIIRS LLSTNLPYGRTNLTDALLQVRKHLNDRINRENANQLVVIL TDGIPDSIQDSLKESRKLSDRGVKIAVFGIGQGINVAFNR FLVGCHPSDGKCNLYADSAWENVKNVIGPFMKAVCVEVEK TASCGVWDEWSPCSVTCGKGTRSRKREILHEGCTSEIQEQ CEEERCPPKWEPLDVPDEPEDDQPRPRGDNSSVQKPEENI IDNNPQEPSPNPEEGKDENPNGFDLDENPENPPNPDIPEQ KPNIPEDSEKEVPSDVPKNPEDDREENFDIPKKPENKHDN QNNLPNDKSDRNIPYSPLPPKVLDNERKQSDPQSQDNNGN RHVPNSEDRETRPHGRNNENRSYNRKYNDTPKHPEREEHE KPDNNKKKGESDNKYKIAGGIAGGLALLACAGLAYKFVVP GAATPYAGEPAPFDETLGEEDKDLDEPEQFRLPEENEWN
Show & Tell Epitope Prediction Results Prediction by our ANN model for HLA-A11 29 predictions 22 epitopes 76% specificity Rank by BIMAS Number of experimental binders 19 (52.8%) 5 (13.9%) 12 (33.3%) Prediction by BIMAS matrix for HLA-A*1101
Show & Tell Gene Expression Analysis Clustering gene expression profiles Classifying gene expression profiles find stable differentially expressed genes
Show & Tell Gene Expression Analysis Results The Discovery System Correlation test Voter selection Class prediction
Show & Tell Protein Interaction Extraction “What are the protein-protein interaction pathways from the latest reported discoveries?”
Show & Tell Protein Interaction Extraction Results Rule-based system for processing free texts in scientific abstracts Specialized in extracting protein names extracting protein-protein interactions
Show & Tell Transcription Start Prediction
Show & Tell Transcription Start Prediction Results
Show & Tell Medical Record Analysis Looking for patterns that are valid novel useful understandable
Show & Tell Medical Record Analysis Results DeEPs, a novel “emerging pattern’’ method Beats C4.5, CBA, LB, NB, TAN in 21 out of 32 UCI benchmarks Works for gene expressions
Show & Tell Under the Hood Artificial neural network Neighbourhood analysis Non-linear analysis Template matching Emerging pattern Hidden markov models Bayesian inference Decision tree induction ...
Show & Tell Behind the Scene Epitope Prediction Vladimir Brusic Judice Koh Seah Seng Hong Zhang Guanglan Yu Kun Transcription Start Prediction Vladimir Bajic Seah Seng Hong Gene Expression Analysis Zhang Louxin Zhang Zhuo Zhu Song Medical Records Li Jinyan Protein Interaction Extraction Ng See Kiong Zhang Zhuo