Bioinformatics Needs for the post-genomic era Dr. Erik Bongcam-Rudloff The Linnaeus Centre for Bioinformatics.

Slides:



Advertisements
Similar presentations
Martin John Bishop UK HGMP Resource Centre Hinxton Cambridge CB10 1 SB
Advertisements

Proteomics Examination Yvonne (Bonnie) Eyler Technology Center 1600 Art Unit 1646 (703)
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Collaborative Information Management: Advanced Information Processing in Bioinformatics Joost N. Kok LIACS - Leiden Institute of Advanced Computer Science.
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
Using Metacomputing Tools to Facilitate Large Scale Analyses of Biological Databases Vinay D. Shet CMSC 838 Presentation Authors: Allison Waugh, Glenn.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Data-intensive Computing: Case Study Area 1: Bioinformatics B. Ramamurthy 6/17/20151.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
Scientific Data Mining: Emerging Developments and Challenges F. Seillier-Moiseiwitsch Bioinformatics Research Center Department of Mathematics and Statistics.
Chromosomes carry genetic information
Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
Proteomics Understanding Proteins in the Postgenomic Era.
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
Overview of Bioinformatics A/P Shoba Ranganathan Justin Choo National University of Singapore A Tutorial on Bioinformatics.
Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.
Lesson 10 Bioinformatics
Protein Tertiary Structure Prediction
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
Genome Informatics 2005 ~ 220 participants 1 keynote speaker: David Haussler 47 talks 121 posters.
Bioinformatics.
Development of Bioinformatics and its application on Biotechnology
Bioinformatics Timothy Ketcham Union College Gradutate Seminar 2003 Bioinformatics.
Automated Explanation of Gene-Gene Relationships Wacek Kuśnierczyk.
Beyond the Human Genome Project Future goals and projects based on findings from the HGP.
Network Services for Biologists in the Genome Era The Work of the European Bioinformatics Institute.
GTL Facilities Computing Infrastructure for 21 st Century Systems Biology Ed Uberbacher ORNL & Mike Colvin LLNL.
Bioinformatics and medicine: Are we meeting the challenge?
Finish up array applications Move on to proteomics Protein microarrays.
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
CSCI 6900/4900 Special Topics in Computer Science Automata and Formal Grammars for Bioinformatics Bioinformatics problems sequence comparison pattern/structure.
Function first: a powerful approach to post-genomic drug discovery Stephen F. Betz, Susan M. Baxter and Jacquelyn S. Fetrow GeneFormatics Presented by.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Lecture 9. Functional Genomics at the Protein Level: Proteomics.
Overview of Bioinformatics 1 Module Denis Manley..
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
BIOLOGICAL DATABASES. BIOLOGICAL DATA Bioinformatics is the science of Storing, Extracting, Organizing, Analyzing, and Interpreting information in biological.
Information Technology in the Natural Sciences Biology – Chemistry – Physics.
Central dogma: the story of life RNA DNA Protein.
EB3233 Bioinformatics Introduction to Bioinformatics.
By: Amira Djebbari and John Quackenbush BMC Systems Biology 2008, 2: 57 Presented by: Garron Wright April 20, 2009 CSCE 582.
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.
Bioinformatics and Computational Biology
An approach to carry out research and teaching in Bioinformatics in remote areas Alok Bhattacharya Centre for Computational Biology & Bioinformatics JAWAHARLAL.
A New Strategy of Protein Identification in Proteomics Xinmin Yin CS Dept. Ball State Univ.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Use of Machine Learning in Chemoinformatics
High throughput biology data management and data intensive computing drivers George Michaels.
1 Modelling and Simulation EMBL – Beyond Molecular Biology Physics Computational Biology Chemistry Medicine.
 Facilities Open House Functional Genomics Facility Molishree Joshi, Ph.D. 6/1/2015 Contact Information:
BME435 BIOINFORMATICS.
Bioinformatics Overview
Data-intensive Computing: Case Study Area 1: Bioinformatics
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Functional Annotation of the Horse Genome
Genomes and Their Evolution
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
The Future of Genetic Research
LESSON 1 INTNRODUCTION HYE-JOO KWON, Ph.D /
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

Bioinformatics Needs for the post-genomic era Dr. Erik Bongcam-Rudloff The Linnaeus Centre for Bioinformatics

From Egg to Adult in 3x109 Bases A single cell, the fertilized egg, eventually differentiates into the ~300 different types of cells that make up an adult body. With a few exceptions all of these cells contain the complete human genome, but express only a subset of the genes. Gene expression patterns are determined largely by cell type, and vice versa.

The “body” has: The genome A comprehensive list of genes Gene expression data Protein localization in the cells Information about Protein/protein and protein/DNA interactions. Ways to store, display and query masses of data so activity can focus on relevant bits.

Primary Flows of Information and Substance in the Cell

Why a Grid? Growth of Molecular Biological problems is getting out of sync with Moore's Law Growing interest in Bioinformatics from other disciplines New experimental approaches (genomics, proteomics, etc.) require new and more demanding solutions

Comparative Genomics Comparative genomics: comparison of whole genomes (e.g. human and mouse) and new techniques for phylogenetic footprinting.

Rnomics Rnomics: tertiary structure prediction and novel RNA gene location in whole genomes We are conducting genome wide scans for RNA regulatory elements and RNA genes using state of the art comparative genomics tools. The analysis involves comparison of the human and mouse genomes using tools such as stochastic context-free grammars

Molecular Interactions Large scale in silico maps of the molecular interactions over entire proteomes and genomes. These maps provide quantitative functional models that bridge the biological with the chemical.

We are developing models of gene participation in biological processes. Such models are developed from microarray-based gene expressions and background knowledge, e.g. as provided by the so- called Gene Ontology. The GRID Test Bed will be an excellent computational environment for finding molecular classifiers associated with e.g. major diseases such as, for instance, cancer, artherosclerosis and other diseases that kill many people in Europe.

What is needed? Standard, stable interfaces to conceptual problem solvers / data / objects A distributed way to store and analyse information Security for user data Avoiding duplication of implementation and computation

Protein structure prediction an example There are over 1.3 million sequences in the non- redundant protein database managed by the NCBI and over 19 thousand structures in the protein data bank (PDB) Using this data we have built a library of common protein substructures linking structure and sequence on a local level Our library consists of over 4000 unique substructure associated with from seven to two thousand examples of sequence fragments

In order to extract properties that recognize proteins containing particular substructures, we iteratively test different (combinations of) properties on proteins containing and proteins not containing the substructure of interest. calculating properties for all groups takes one week on ten Athlon XP (1.46 GHz, 1GB RAM) processors In a more realistic search space, without the drastic search space reductions, we estimate to need approximately 700 processor days with 2GB RAM. And depending on the available resources, we would like to run several such trails in order to test different parameter settings. Thus our upper estimates may be multiplied by a factor 5-10.