Show & Tell Limsoon Wong KRDL Datamining: Turning Biological Data into Gold.

Slides:



Advertisements
Similar presentations
Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.
Advertisements

Mining Association Rules from Microarray Gene Expression Data.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
. Inferring Subnetworks from Perturbed Expression Profiles D. Pe’er A. Regev G. Elidan N. Friedman.
Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
BioinformaticsFox Chase Cancer Center Signaling, Microarrays, and Annotations Michael Ochs Information Science and Technology, Fox Chase Cancer Center.
CSE 591 (99689) Application of AI to molecular Biology (5:15 – 6: 30 PM, PSA 309) Instructor: Chitta Baral Office hours: Tuesday 2 to 5 PM.
Introduction to the Knowledge Discovery Department Institute for Infocomm Research Limsoon Wong Deputy Executive Director (Research) I 2 R: Imagination.
GENIE – GEne Network Inference with Ensemble of trees Van Anh Huynh-Thu Department of Electrical Engineering and Computer Science, Systems and Modeling,
Recent Developments in Human Motion Analysis
Bio 465 Summary. Overview Conserved DNA Conserved DNA Drug Targets, TreeSAAP Drug Targets, TreeSAAP Next Generation Sequencing Next Generation Sequencing.
Data Mining: Discovering Information From Bio-Data Present by: Hongli Li & Nianya Liu University of Massachusetts Lowell.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
Review of important points from the NCBI lectures. –Example slides Review the two types of microarray platforms. –Spotted arrays –Affymetrix Specific examples.
CISC667, F05, Lec27, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Review Session.
Copyright © 2004 by Jinyan Li and Limsoon Wong Rule-Based Data Mining Methods for Classification Problems in Biomedical Domains Jinyan Li Limsoon Wong.
Enterprise systems infrastructure and architecture DT211 4
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Knowledge Discovery in Biomedicine Limsoon Wong Institute for Infocomm Research.
Big Data Network Genomics Network Inference and Perturbation to Study Chemical-Mediated Cancer Induction Stefano Monti Section of Computational.
Copyright  2003 limsoon wong Data Mining of Gene Expression Profiles for the Diagnosis and Understanding of Diseases Limsoon Wong Institute for Infocomm.
Accomplishments and Challenges in Literature Data Mining for Biology L. Hirschman et al. Presented by Jing Jiang CS491CXZ Spring, 2004.
Gene Regulatory Network Inference. Progress in Disease Treatment  Personalized medicine is becoming more prevalent for several kinds of cancer treatment.
Knowledge Discovery from Biological and Clinical Data: BASIC BACKGROUND.
Part 1: Biological Networks 1.Protein-protein interaction networks 2.Regulatory networks 3.Expression networks 4.Metabolic networks 5.… more biological.
Construction of cancer pathways for personalized medicine | Presented By Date Construction of cancer pathways for personalized medicine Predictive, Preventive.
Copyright  2003 limsoon wong From Informatics to Bioinformatics: The Knowledge Discovery Perspective Limsoon Wong Institute for Infocomm Research Singapore.
Data Mining In contrast to the traditional (reactive) DSS tools, the data mining premise is proactive. Data mining tools automatically search the data.
Bellwork: Unit 1- Nature of Science LIFE SCIENCE.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Holt Earth Science Chapter 1 Section 2
Limsoon Wong Laboratories for Information Technology Singapore From Informatics to Bioinformatics.
Medstar: a prototype for biomedical social network Xiaoli Li Institute for Infocomm Research A*Star, Singapore.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004.
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.
Bell work You are asked to write a biology textbook. What would your OWN definition of “science” be?
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College Bio Informatics January
Shortest Path Analysis and 2nd-Order Analysis Ming-Chih Kao U of M Medical School
Eigengenes as biological signatures Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University 5.
Limsoon Wong Laboratories for Information Technology Singapore From Datamining to Bioinformatics.
Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong For written notes on this lecture, please read chapter 3 of The Practical Bioinformatician, CS2220:
Copyright © 2004 by Jinyan Li and Limsoon Wong Rule-Based Data Mining Methods for Classification Problems in Biomedical Domains Jinyan Li Limsoon Wong.
Chapter 1: Section 1 What is Science?. What Science IS and IS NOT.. The goal of Science is to investigate and understand the natural world, to explain.
Copyright  2004 limsoon wong CS2220: Computation Foundation in Bioinformatics Limsoon Wong Institute for Infocomm Research Lecture slides for 13 January.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Effect of Alcohol on Brain Development NormalFetal Alcohol Syndrome.
Eigengenes as biological signatures Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University 3.
BIOBASE Training TRANSFAC ® Containing data on eukaryotic transcription factors, their experimentally-proven binding sites, and regulated genes ExPlain™
Department of Computer Science Sir Syed University of Engineering & Technology, Karachi-Pakistan. Presentation Title: DATA MINING Submitted By.
Show & Tell Limsoon Wong Kent Ridge Digital Labs Singapore Role of Bioinformatics in the Genomic Era.
Data Mining is the process of analyzing data and summarizing it into useful information Data Mining is usually used for extremely large sets of data It.
Copyright © 2004 by Jinyan Li and Limsoon Wong Rule-Based Data Mining Methods for Classification Problems in Biomedical Domains Jinyan Li Limsoon Wong.
Limsoon Wong Laboratories for Information Technology Singapore From Informatics to Bioinformatics.
TDM in the Life Sciences Application to Drug Repositioning *
IMMUNOGRID Nikolai Petrovsky and Vladimir Brusic
M. Fu, G. Huang, Z. Zhang, J. Liu, Z. Zhang, Z. Huang, B. Yu, F. Meng 
From Informatics to Bioinformatics Limsoon Wong
From Informatics to Bioinformatics Limsoon Wong
Machine Learning for High-Throughput Stress Phenotyping in Plants
Introduction: Themes in the Study of Life
Systems Biology Strikes Gold
R.H. Brophy, B. Zhang, L. Cai, R.W. Wright, L.J. Sandell, M.F. Rai 
Expression profile of long noncoding rnas in osteoarthritis patients
BIOBASE Training TRANSFAC® ExPlain™
TFs and predicted regulatory networks for the tissue- and lineage-dependent clusters 2, 3, and 9. TFs and predicted regulatory networks for the tissue-
THE TOPICS AND TITLES OF RESEARCH
Presentation transcript:

Show & Tell Limsoon Wong KRDL Datamining: Turning Biological Data into Gold

Show & Tell Jonathan’s rules: Blue or Circle Jessica’s rules: All the rest What is Datamining? Whose block is this? Jonathan’s blocks Jessica’s blocks

Show & Tell What is Datamining? Question: Can you explain how?

Show & Tell What are the Benefits?  To the patient:  Better drug, better treatment  To the pharma:  Save time, save cost, make more $  To the scientist:  Better science

Show & Tell The Datamining Process

Show & Tell Epitope Prediction TRAP-559AA MNHLGNVKYLVIVFLIFFDLFLVNGRDVQNNIVDEIKYSE EVCNDQVDLYLLMDCSGSIRRHNWVNHAVPLAMKLIQQLN LNDNAIHLYVNVFSNNAKEIIRLHSDASKNKEKALIIIRS LLSTNLPYGRTNLTDALLQVRKHLNDRINRENANQLVVIL TDGIPDSIQDSLKESRKLSDRGVKIAVFGIGQGINVAFNR FLVGCHPSDGKCNLYADSAWENVKNVIGPFMKAVCVEVEK TASCGVWDEWSPCSVTCGKGTRSRKREILHEGCTSEIQEQ CEEERCPPKWEPLDVPDEPEDDQPRPRGDNSSVQKPEENI IDNNPQEPSPNPEEGKDENPNGFDLDENPENPPNPDIPEQ KPNIPEDSEKEVPSDVPKNPEDDREENFDIPKKPENKHDN QNNLPNDKSDRNIPYSPLPPKVLDNERKQSDPQSQDNNGN RHVPNSEDRETRPHGRNNENRSYNRKYNDTPKHPEREEHE KPDNNKKKGESDNKYKIAGGIAGGLALLACAGLAYKFVVP GAATPYAGEPAPFDETLGEEDKDLDEPEQFRLPEENEWN

Show & Tell Epitope Prediction Results  Prediction by our ANN model for HLA-A11  29 predictions  22 epitopes  76% specificity Rank by BIMAS Number of experimental binders 19 (52.8%) 5 (13.9%) 12 (33.3%)  Prediction by BIMAS matrix for HLA-A*1101

Show & Tell Gene Expression Analysis  Clustering gene expression profiles  Classifying gene expression profiles  find stable differentially expressed genes

Show & Tell Gene Expression Analysis Results The Discovery System Correlation test Voter selection Class prediction

Show & Tell Protein Interaction Extraction “What are the protein-protein interaction pathways from the latest reported discoveries?”

Show & Tell Protein Interaction Extraction Results  Rule-based system for processing free texts in scientific abstracts  Specialized in  extracting protein names  extracting protein-protein interactions

Show & Tell Transcription Start Prediction

Show & Tell Transcription Start Prediction Results

Show & Tell Medical Record Analysis  Looking for patterns that are  valid  novel  useful  understandable

Show & Tell Medical Record Analysis Results  DeEPs, a novel “emerging pattern’’ method  Beats C4.5, CBA, LB, NB, TAN in 21 out of 32 UCI benchmarks  Works for gene expressions

Show & Tell Under the Hood  Artificial neural network  Neighbourhood analysis  Non-linear analysis  Template matching  Emerging pattern  Hidden markov models  Bayesian inference  Decision tree induction ...

Show & Tell Behind the Scene  Epitope Prediction  Vladimir Brusic  Judice Koh  Seah Seng Hong  Zhang Guanglan  Yu Kun  Transcription Start Prediction  Vladimir Bajic  Seah Seng Hong  Gene Expression Analysis  Zhang Louxin  Zhang Zhuo  Zhu Song  Medical Records  Li Jinyan  Protein Interaction Extraction  Ng See Kiong  Zhang Zhuo