Limsoon Wong Laboratories for Information Technology Singapore From Informatics to Bioinformatics.

Slides:



Advertisements
Similar presentations
Discovery Studio AtlasStore: Protein/Ligand Database Steve Potts, Ph.D., MBA Product Manager Biological Informatics
Advertisements

Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Show & Tell Limsoon Wong KRDL Datamining: Turning Biological Data into Gold.
Introduction to Bioinformatics Richard H. Scheuermann, Ph.D. Director of Informatics JCVI.
Collaborative Information Management: Advanced Information Processing in Bioinformatics Joost N. Kok LIACS - Leiden Institute of Advanced Computer Science.
Bioinformatics Needs for the post-genomic era Dr. Erik Bongcam-Rudloff The Linnaeus Centre for Bioinformatics.
Introduction to the Knowledge Discovery Department Institute for Infocomm Research Limsoon Wong Deputy Executive Director (Research) I 2 R: Imagination.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
TRANSFAC Project Roadmap Discussion.  Structure DNA-binding domain (DBD)  The portion (domain) of the transcription factor that binds DNA Trans-activating.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.
CBioC: Massive Collaborative Curation of Biomedical Literature Future Directions.
Introduction to the Knowledge Discovery Department Institute for Infocomm Research Limsoon Wong Deputy Executive Director (Research) I 2 R: Imagination.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Class Projects. Future Work and Possible Project Topic in Gene Regulatory network Learning from multiple data sources; Learning causality in Motifs; Learning.
Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D. (1999). Detecting protein function and protein-protein interactions from genome sequences.
Copyright  2003 limsoon wong Diagnosis of Childhood Acute Lymphoblastic Leukemia and Optimization of Risk-Benefit Ratio of Therapy Limsoon Wong Institute.
Introduction The goal of translational bioinformatics is to enable the transformation of increasingly voluminous genomic and biological data into diagnostics.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
Class policies and expectations Overall structure of class Last class Principle of DNA extraction RE Digests.
Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.
Protein Tertiary Structure Prediction
Knowledgebase Creation & Systems Biology: A new prospect in discovery informatics S.Shriram, Siri Technologies (Cytogenomics), Bangalore S.Shriram, Siri.
Development of Bioinformatics and its application on Biotechnology
EnrichNet: network-based gene set enrichment analysis Presenter: Lu Liu.
Knowledge Discovery in Biomedicine Limsoon Wong Institute for Infocomm Research.
Copyright  2003 limsoon wong Data Mining of Gene Expression Profiles for the Diagnosis and Understanding of Diseases Limsoon Wong Institute for Infocomm.
Yike Guo/Jiancheng Lin InforSense Ltd. 15 September 2015 Bioinformatics workflow integration.
Life Sciences Integrated Demo Joyce Peng Senior Product Manager, Life Sciences Oracle Corporation
Accomplishments and Challenges in Literature Data Mining for Biology L. Hirschman et al. Presented by Jing Jiang CS491CXZ Spring, 2004.
A New Oklahoma Bioinformatics Company. Microarray and Bioinformatics.
The rise of digitized medicine disrupts current research and business models Jesper Tegnér Director of the Unit for Computational Medicine, Department.
1 Bio-Trac 40 (Protein Bioinformatics) October 8, 2009 Zhang-Zhi Hu, M.D. Associate Professor Department of Oncology Department of Biochemistry and Molecular.
Knowledge Discovery from Biological and Clinical Data: BASIC BACKGROUND.
Copyright  2003 limsoon wong From Informatics to Bioinformatics: The Knowledge Discovery Perspective Limsoon Wong Institute for Infocomm Research Singapore.
Center for Computational Intelligence, Learning, and Discovery Artificial Intelligence Research Laboratory Department of Computer Science Supported in.
Relational Database vs. Data Files By Willa Zhu JISAO/UW - PMEL/NOAA March 25, 2005.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Medstar: a prototype for biomedical social network Xiaoli Li Institute for Infocomm Research A*Star, Singapore.
Information Technology in the Natural Sciences Biology – Chemistry – Physics.
GeWorkbench John Watkinson Columbia University. geWorkbench The bioinformatics platform of the National Center for the Multi-scale Analysis of Genomic.
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.
Mining the Biomedical Research Literature Ken Baclawski.
Bioinformatics and Computational Biology
Rehospitalization Analytics: Modeling and Reducing the Risks of Rehospitalization Chandan K. Reddy Department of Computer Science, Wayne State University.
Introduction and Applications of Microarray Databases Chen-hsiung Chan Department of Computer Science and Information Engineering National Taiwan University.
Bioinformatics Dipl. Ing. (FH) Patrick Grossmann
Final Report (30% final score) Bin Liu, PhD, Associate Professor.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – the Transcription.
DISCUSSION Using a Literature-based NMF Model for Discovering Gene Functional Relationships Using a Literature-based NMF Model for Discovering Gene Functional.
Limsoon Wong Laboratories for Information Technology Singapore From Datamining to Bioinformatics.
Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong For written notes on this lecture, please read chapter 3 of The Practical Bioinformatician, CS2220:
Copyright  2004 limsoon wong CS2220: Computation Foundation in Bioinformatics Limsoon Wong Institute for Infocomm Research Lecture slides for 13 January.
PROTEIN INTERACTION NETWORK – INFERENCE TOOL DIVYA RAO CANDIDATE FOR MASTER OF SCIENCE IN BIOINFORMATICS ADVISOR: Dr. FILIPPO MENCZER CAPSTONE PROJECT.
Show & Tell Limsoon Wong Kent Ridge Digital Labs Singapore Role of Bioinformatics in the Genomic Era.
Limsoon Wong Laboratories for Information Technology Singapore From Informatics to Bioinformatics.
TDM in the Life Sciences Application to Drug Repositioning *
IMMUNOGRID Nikolai Petrovsky and Vladimir Brusic
David Amar, Tom Hait, and Ron Shamir
Gregory Cooper Professor of Biomedical Informatics Director, Center for Causal Discovery Vice Chair Research, Department of Biomedical Informatics.
Bio68: Bioinformatics Databases
Gene expression.
From Informatics to Bioinformatics Limsoon Wong
From Informatics to Bioinformatics Limsoon Wong
What is Pattern Recognition?
Introduction: Themes in the Study of Life
Large Scale Annotation of Genomic Datasets with Genephony
Accelerating drug discovery: Open source cancer cell biology?
BIOINFORMATICS Summary
BIOBASE Training TRANSFAC® ExPlain™
Presentation transcript:

Limsoon Wong Laboratories for Information Technology Singapore From Informatics to Bioinformatics

What is Bioinformatics?

Themes of Bioinformatics Bioinformatics = Data Mgmt + Knowledge Discovery Data Mgmt = Integration + Transformation + Cleansing Knowledge Discovery = Statistics + Algorithms + Databases

Benefits of Bioinformatics To the patient: Better drug, better treatment To the pharma: Save time, save cost, make more $ To the scientist: Better science

From Informatics to Bioinformatics Integration Technology (Kleisli) Cleansing & Warehousing (FIMM) MHC-Peptide Binding (PREDICT) Protein Interactions Extraction (PIES) Gene Expression & Medical Record Datamining (PCL) Gene Feature Recognition (Dragon) Venom Informatics years of bioinformatics R&D in Singapore ISS KRDL LIT

Data Integration A DOE “impossible query”: For each gene on a given cytogenetic band, find its non-human homologs.

Data Integration Results sybase-add (#name:”GDB",...); create view L from locus_cyto_location using GDB; create view E from object_genbank_eref using GDB; select #accn: g.#genbank_ref, #nonhuman-homologs: H from L as c, E as g, {select u from g.#genbank_ref.na-get-homolog-summary as u where not(u.#title string-islike "%Human%") andalso not(u.#title string-islike "%H.sapien%")} as H where c.#chrom_num = "22” andalso g.#object_id = c.#locus_id andalso not (H = { }); Using Kleisli : Clear Succinct Efficient Handles heterogeneity complexity

Data Warehousing Motivation efficiency availabilty “denial of service” data cleansing Requirements efficient to query easy to update. model data naturally {(#uid: , #title: "Homo sapiens adrenergic...", #accession: "NM_001619", #organism: "Homo sapiens", #taxon: 9606, #lineage: ["Eukaryota", "Metazoa", …], #seq: "CTCGGCCTCGGGCGCGGC...", #feature: { (#name: "source", #continuous: true, #position: [ (#accn: "NM_001619", #start: 0, #end: 3602, #negative: false)], #anno: [ (#anno_name: "organism", #descr: "Homo sapiens"), …] ), …)}

Data Warehousing Results Relational DBMS is insufficient because it forces us to fragment data into 3NF. Kleisli turns flat relational DBMS into nested relational DBMS. It can use flat relational DBMS such as Sybase, Oracle, MySQL, etc. to be its update-able complex object store. ! Log in oracle-cplobj-add (#name: "db",...); ! Define table create table GP (#uid: "NUMBER", #detail: "LONG") using db; ! Populate table with GenPept reports select #uid: x.#uid, #detail: x into GP from aa-get-seqfeat-general "PTP” as x using db; ! Map GP to that table create view GP from GP using db; ! Run a queryto get title of select x.#detail.#title from GP as x where x.#uid = ;

Epitope Prediction TRAP-559AA MNHLGNVKYLVIVFLIFFDLFLVNGRDVQNNIVDEIKYSE EVCNDQVDLYLLMDCSGSIRRHNWVNHAVPLAMKLIQQLN LNDNAIHLYVNVFSNNAKEIIRLHSDASKNKEKALIIIRS LLSTNLPYGRTNLTDALLQVRKHLNDRINRENANQLVVIL TDGIPDSIQDSLKESRKLSDRGVKIAVFGIGQGINVAFNR FLVGCHPSDGKCNLYADSAWENVKNVIGPFMKAVCVEVEK TASCGVWDEWSPCSVTCGKGTRSRKREILHEGCTSEIQEQ CEEERCPPKWEPLDVPDEPEDDQPRPRGDNSSVQKPEENI IDNNPQEPSPNPEEGKDENPNGFDLDENPENPPNPDIPEQ KPNIPEDSEKEVPSDVPKNPEDDREENFDIPKKPENKHDN QNNLPNDKSDRNIPYSPLPPKVLDNERKQSDPQSQDNNGN RHVPNSEDRETRPHGRNNENRSYNRKYNDTPKHPEREEHE KPDNNKKKGESDNKYKIAGGIAGGLALLACAGLAYKFVVP GAATPYAGEPAPFDETLGEEDKDLDEPEQFRLPEENEWN

Epitope Prediction Results  Prediction by our ANN model for HLA-A11  29 predictions  22 epitopes  76% specificity Rank by BIMAS Number of experimental binders 19 (52.8%) 5 (13.9%) 12 (33.3%)  Prediction by BIMAS matrix for HLA-A*1101

Transcription Start Prediction

Transcription Start Prediction Results

Medical Record Analysis  Looking for patterns that are  valid  novel  useful  understandable

Gene Expression Analysis  Classifying gene expression profiles  find stable differentially expressed genes  find significant gene groups  derive coordinated gene expression

Medical Record & Gene Expression Analysis Results  PCL, a novel “emerging pattern’’ method  Beats C4.5, CBA, LB, NB, TAN in 21 out of 32 UCI benchmarks  Works well for gene expressions Cancer Cell, March 2002, 1(2)

Protein Interaction Extraction “What are the protein-protein interaction pathways from the latest reported discoveries?”

Protein Interaction Extraction Results  Rule-based system for processing free texts in scientific abstracts  Specialized in  extracting protein names  extracting protein- protein interactions

Behind the Scene  Vladimir Bajic  Vladimir Brusic  Jinyan Li  See-Kiong Ng  Limsoon Wong  Louxin Zhang  Allen Chong  Judice Koh  SPT Krishnan  Huiqing Liu  Seng Hong Seah  Soon Heng Tan  Guanglan Zhang  Zhuo Zhang and many more: students, folks from geneticXchange, MolecularConnections, and other collaborators….