Collaborative Information Management: Advanced Information Processing in Bioinformatics Joost N. Kok LIACS - Leiden Institute of Advanced Computer Science.

Slides:



Advertisements
Similar presentations
Mining Association Rules from Microarray Gene Expression Data.
Advertisements

3D Molecular Structures C371 Fall Morgan Algorithm (Leach & Gillet, p. 8)
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data Kai Ye
Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.
Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
Bioinformatics For MNW 2 nd Year Jaap Heringa FEW/FALW Integrative Bioinformatics Institute VU (IBIVU) Tel ,
Bioinformatics at IU - Ketan Mane. Bioinformatics at IU What is Bioinformatics? Bioinformatics is the study of the inherent structure of biological information.
Bioinformatics Needs for the post-genomic era Dr. Erik Bongcam-Rudloff The Linnaeus Centre for Bioinformatics.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
The Golden Age of Biology DNA -> RNA -> Proteins -> Metabolites Genomics Technologies MECHANISMS OF LIFE Health Care Diagnostics Medicines Animal Products.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Data-intensive Computing: Case Study Area 1: Bioinformatics B. Ramamurthy 6/17/20151.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
Genotyping of James Watson’s genome from Low-coverage Sequencing Data Sanjiv Dinakar and Yözen Hernández.
Human Genome Project Seminal achievement. Scientific milestone. Scientific implications. Social implications.
From T. MADHAVAN, & K.Chandrasekaran Lecturers in Zoology.. EXIT.
Paola CASTAGNOLI Maria FOTI Microarrays. Applicazioni nella genomica funzionale e nel genotyping DIPARTIMENTO DI BIOTECNOLOGIE E BIOSCIENZE.
Data Mining Techniques
Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Bioinformatics Timothy Ketcham Union College Gradutate Seminar 2003 Bioinformatics.
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
Beyond the Human Genome Project Future goals and projects based on findings from the HGP.
Introduction to Pharmacoinformatics
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
Chapter 1 Introduction to Data Mining
A New Oklahoma Bioinformatics Company. Microarray and Bioinformatics.
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science in NPACI Russ B. Altman NPACI Molecular Science Thrust Stanford Medical.
Bioinformatics Brad Windle Ph# Web Site:
©Edited by Mingrui Zhang, CS Department, Winona State University, 2008 Identifying Lung Cancer Risks.
What is Genetic Research?. Genetic Research Deals with Inherited Traits DNA Isolation Use bioinformatics to Research differences in DNA Genetic researchers.
Bioinformatics For MNW 2 nd Year Jaap Heringa FEW/FALW Centre for Integrative Bioinformatics VU (IBIVU) Tel ,
Data Mining By Dave Maung.
+ => Bioinformatics: from Sequence to Knowledge Outline: Introduction to bioinformatics The TAU Bioinformatics unit Useful bioinformatics issues and databases:
Discovering Structural Models Lecture 19. Structural Models in Science Structural models encode the spatial relationships among the components of some.
Introduction to Scientific Data Lecture 5. The Informatics Effect Computers have transformed how we collect, store, analyze, and visualize data. Notably,
Mining Biological Data. Protein Enzymatic ProteinsTransport ProteinsRegulatory Proteins Storage ProteinsHormonal ProteinsReceptor Proteins.
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
Bioinformatics The Prediction of Life Tony C Smith Department of Computer Science University of Waikato
Julia N. Chapman, Alia Kamal, Archith Ramkumar, Owen L. Astrachan Duke University, Genome Revolution Focus, Department of Computer Science Sources
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
BIOLOGICAL DATABASES. BIOLOGICAL DATA Bioinformatics is the science of Storing, Extracting, Organizing, Analyzing, and Interpreting information in biological.
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
Information Technology in the Natural Sciences Biology – Chemistry – Physics.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
Bioinformatics and Computational Biology
1 Bioinformatics at Norwegian University of Science and Technology Professor Finn Drabløs Department of Cancer Research and Molecular Medicine Finn Drabløs.
Arrowsmith extensions to bio-informatics Vetle I. Torvik.
Evolution and the Foundations of Biology
Notes: Human Genome (Right side page)
1 Survey of Biodata Analysis from a Data Mining Perspective Peter Bajcsy Jiawei Han Lei Liu Jiong Yang.
Chapter 13 Section 13.3 The Human Genome. Genomes contain all the information needed for an organism to grow and survive The Human Genome Project (HGP)
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Looking Within Human Genome King abdulaziz university Dr. Nisreen R Tashkandy GENOMICS ; THE PIG PICTURE.
MATLAB Distributed, and Other Toolboxes
생물정보학 Bioinformatics.
Bellwork: What is the human genome project. What was its purpose
Genomes and Their Evolution
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
The Future of Genetic Research
Bioinformatics For MNW 2nd Year
Applying principles of computer science in a biological context
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Biotechnology & Bioinformatics
Presentation transcript:

Collaborative Information Management: Advanced Information Processing in Bioinformatics Joost N. Kok LIACS - Leiden Institute of Advanced Computer Science & LUMC - Leiden University Medical Center

BioRange Bioinformatics for microarray technology Bioinformatics for proteomics and metabolomics Integrative bioinformatics Vl-e informatics for bioinformatics applications Test bed with “real-life applications”

Biorange CIM, AIM in BioINF Five research lines: Information Structuring Heterogenous Data Integration Advanced Mining Algorithms Data Interlinking and Integration Data Storage and Management

1: Advanced Mining Algorithms

Data Mining Data Mining is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data useful novel, surprising comprehensible valid (accurate)

Data Mining It is somewhat comparable to statistics (and often based on the latter), but takes it further in the sense that whereas statistics aims more at validating given hypotheses, in data mining often millions of potential patterns are generated and tested, in the hope of finding some that are potentially useful.

Intelligent Interfaces

Case study: SNP data Genome scan comprising 500K data points (Single Nucleotide Polymorphisms or SNPs) in 900 subjects from families expressing survival to extremely high ages (longevity). The analysis of this set of 450 million data points is to recognize patterns specific for the genetic make-up of long survivors.

Case study: SNP data The genetic scan data will be combined with gene expression data (30,000 data points per subject in 100 subjects), protein data (NMR spectra from blood parameters in hundreds of subjects) and imaging data (quantitative photography of facial ageing parameters).

Case study: SNP data Subjects with SNP’s Classes (Young, Old) Above a certain support within Y,O Above a certain difference between classes Y,O Above a certain correlation with a class Y,O etc

Substructures Sequences DNA Trees XML documents Graphs Molecules GASTON Tools hms.liacs.nl

Mutagenicity data set of 4069 compounds (56% mutagenic)

To boldly go where no chemist has gone before 08 February 2006 Studying the interactions between different molecular fragments is taking researchers to the uncharted regions of chemical space. © NASA-JPL Chemical space, consisting of all possible stable molecules, is mind-bogglingly vast. Theoretical chemists have calculated that there are more possible molecules based on hexane (10**29) than there are stars in the visible universe. Chemists have only made fairly tentative journeys into this space, with the largest chemical databases currently containing up to 25 million different molecules. Ad IJzerman from Leiden University, the Netherlands, and colleagues realised that analysing these chemical databases could reveal which regions of chemical space have been extensively explored and which remain relatively uncharted. IJzerman’s team split the molecular structures contained in the US National Cancer Institute’s database into component fragments, consisting of rings, substituents and several types of linkers. This generated different fragments, of which the vast majority (70 per cent) occurred only once. The chemists selected the 1730 fragments that occurred in more then 20 different molecules and calculated the number of times that each possible pair of fragments occurred in the same molecule. Some pairs of fragments were commonly found together, forming what the researchers termed ‘chemical clichés’, but others were rarely found in the same molecule. By generating molecules containing the fragments that aren’t often brought together, predict the researchers, chemists should be able to open up new areas of chemical space and potentially discover new molecules with interesting properties. IJzerman has already demonstrated the benefits of this fragment analysis to a medicinal chemist. She was having problems with a particular compound and he suggested possible alternative ring systems, based on his list of the most popular ring fragments. ‘It turned out that one of our top 40 ring systems was actually her intended modification, reached after much deliberation,’ he told Chemistry World.

2: Data Storage and Management

Patternbases Pattern Databases = Patterns + Data Query Languages work on Patterns + Data Since patternbases provide an architecture for pattern discovery and a means to discover and use those patterns through the query language, data mining becomes in essence an interactive querying process.

Patternbases Derive new patterns from data + old patterns Apriori Algorithm: Frequent Item Sets Frequent Items Sets + Data: Assocation Rules

Patternbases Derive new patterns from data + old patterns Find all item sets that are correlated with classes Fix a We can prune the search space by only considering frequent item sets with minimum support

Patternbases

Research Lines Biorange Five research lines: Information Structuring Heterogenous Data Integration Advanced Mining Algorithms Data Interlinking and Integration Data Storage and Management