Contents of this Talk [Used as intro to Genome Databases Seminar, 2002] Overview of bioinformatics Motivations for genome databases Analogy of virus reverse-eng.

Slides:



Advertisements
Similar presentations
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Advertisements

A Lite Introduction to (Bioinformatics and) Comparative Genomics Chris Mueller August 10, 2004.
HCS806 “Methods in Horticulture and Crop Science” Introduction to methods in Bioinformatics for plant science. David Francis (Coordinator) Ian Holford.
Who am I Gianluca Correndo PhD student (end of PhD) Work in the group of medical informatics (Paolo Terenziani) PhD thesis on contextualization techniques.
Overview of Genome Databases Peter D. Karp, Ph.D. SRI International www-db.stanford.edu/dbseminar/seminar.html.
Interoperation of Molecular Biology Databases Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International Menlo Park, CA
Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
1 CIS607, Fall 2006 Semantic Information Integration Instructor: Dejing Dou Week 10 (Nov. 29)
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
Biological Ontologies Neocles Leontis April 20, 2005.
The bioinformatics of biological processes The challenge of temporal data Per J. Kraulis CMCM, Tartu University.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Pathway/Genome Databases and Software Tools Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International
Algorithms in Computational Biology Tanya Berger-Wolf Compbio.cs.uic.edu/~tanya/teaching/CompBio January 13, 2006.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
Overview of Bioinformatics A/P Shoba Ranganathan Justin Choo National University of Singapore A Tutorial on Bioinformatics.
Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
Development of Bioinformatics and its application on Biotechnology
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
Automated Explanation of Gene-Gene Relationships Wacek Kuśnierczyk.
Introduction to databases Tuomas Hätinen. Topics File Formats Databases -Primary structure: UniProt -Tertiary structure: PDB Database integration system.
Grant Number: IIS Institution of PI: Arizona State University PIs: Zoé Lacroix Title: Collaborative Research: Semantic Map of Biological Data.
Chapter 1 Introduction to Data Mining
SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman & Mario Latendresse Bioinformatics Research Group SRI, International.
SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman & Mario Latendresse Bioinformatics Research Group SRI, International.
A Lite Introduction to (Bioinformatics and) Comparative Genomics Chris Mueller November 18, 2004 Based on the Genomics in Biomedical Research course at.
Chapter 3 DECISION SUPPORT SYSTEMS CONCEPTS, METHODOLOGIES, AND TECHNOLOGIES: AN OVERVIEW Study sub-sections: , 3.12(p )
Copyright OpenHelix. No use or reproduction without express written consent1.
Organizing information in the post-genomic era The rise of bioinformatics.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
10/24/09CK The Open Ontology Repository Initiative: Requirements and Research Challenges Ken Baclawski Todd Schneider.
Overview of Bioinformatics 1 Module Denis Manley..
COMPUTERS IN BIOLOGY Elizabeth Muros INTRO TO PERSONAL COMPUTING.
Information Technology in the Natural Sciences Biology – Chemistry – Physics.
An overview of Bioinformatics. Cell and Central Dogma.
Mining the Biomedical Research Literature Ken Baclawski.
Bioinformatics and Computational Biology
An approach to carry out research and teaching in Bioinformatics in remote areas Alok Bhattacharya Centre for Computational Biology & Bioinformatics JAWAHARLAL.
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
Opportunities for Text Mining in Bioinformatics (CS591-CXZ Text Data Mining Seminar) Dec. 8, 2004 ChengXiang Zhai Department of Computer Science University.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
1 Integration of data sources Patrick Lambrix Department of Computer and Information Science Linköpings universitet.
SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman Mario Latendresse Bioinformatics Research Group SRI International April.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
High throughput biology data management and data intensive computing drivers George Michaels.
Effect of Alcohol on Brain Development NormalFetal Alcohol Syndrome.
1 Survey of Biodata Analysis from a Data Mining Perspective Peter Bajcsy Jiawei Han Lei Liu Jiong Yang.
Informatics for Scientific Data Bio-informatics and Medical Informatics Week 9 Lecture notes INF 380E: Perspectives on Information.
Sub-fields of computer science. Sub-fields of computer science.
BME435 BIOINFORMATICS.
Bioinformatics Overview
Why Create a PGDB? Perform pathway analyses as part of a genome project Analyze omics data Create a central public information resource for the organism,
Biological Databases By: Komal Arora.
Techniques for Finding Patterns in Large Amounts of Data: Applications in Biology Vipin Kumar William Norris Professor and Head, Department of Computer.
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
Functional Annotation of the Horse Genome
A User’s Guide to GO: Structural and Functional Annotation
9 Future Challenges for Bioinformatics
MANAGING DATA RESOURCES
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Batyr Charyyev.
Introduction to Bioinformatic
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Contents of this Talk [Used as intro to Genome Databases Seminar, 2002] Overview of bioinformatics Motivations for genome databases Analogy of virus reverse-eng to genome analysis Questions to ask of a genome DB

Overview of Genome Databases Peter D. Karp, Ph.D. SRI International www-db.stanford.edu/dbseminar/seminar.html

Talk Overview Definition of bioinformatics Motivations for genome databases Computer virus analogy Issues in building genome databases

Definition of Bioinformatics Computational techniques for management and analysis of biological data and knowledge l Methods for disseminating, archiving, interpreting, and mining scientific information Computational theories of biology Genome Databases is a subfield of bioinformatics

Motivations for Bioinformatics Growth in molecular-biology knowledge (literature) Genomics 1. Study of genomes through DNA sequencing 2. Industrial Biology

Example Genomics Datatypes Genome sequences l DOE Joint Genome Institute u 511M bases in Dec 2001 u 11.97G bases since Mar 1999 Gene and protein expression data Protein-protein interaction data Protein 3-D structures

Genome Databases Experimental data l Archive experimental datasets l Retrieving past experimental results should be faster than repeating the experiment l Capture alternative analyses l Lots of data, simpler semantics Computational symbolic theories l Complex theories become too large to be grasped by a single mind l The database is the theory l Biology is very much concerned with qualitative relationships l Less data, more complex semantics

Bioinformatics Distinct intellectual field at the intersection of CS and molecular biology Distinct field because researchers in the field should know CS, biology, and bioinformatics Spectrum from CS research to biology service Rich source of challenging CS problems Large, noisy, complex data-sets and knowledge-sets Biologists and funding agencies demand working solutions

Common Computer-Science Areas Database design and interoperation Machine learning Scientific visualization Combinatorial algorithms Distributed systems Text understanding

Bioinformatics Research algorithms + data structures = programs algorithms + databases = discoveries Combine sophisticated algorithms with the right content: l Properly structured l Carefully curated l Relevant data fields l Proper amount of data

Goals of Systems Biology Catalog the molecular parts lists of cells Understand the function(s) of each part Understand how those parts interact to produce the behavior of a cell or organism Understand the evolution of those molecular parts

Analogy: Genome Analysis and Virus Analysis Given: Virus binary executable file for known machine architecture Reverse engineer the program l Procedures l Call graph l Specifications for I/O behavior of the program and all procedures Capture and publish an annotated analysis of the virus Comparative analysis of related viruses

Genome Analysis Example: M. tuberculosis genome Given: 4.4Mbp of DNA (genome) Infer: l Molecular parts list of Mtb l A model of the biochemical machinery of Mtb cell DNA is a blueprint for the program of life

Start 4.4Mbyte binary program 4.4Mbp DNA sequence

Step 1 Distinguish code from data segments Find procedure boundaries Distinguish coding from non-coding regions – Gene Finding

Step 2 Predict semantics of procedures Predict gene functions A B C D

Step 3 Predict procedure call graph Predict biochemical and gene networks AB C D AB C D A B C D

Step 4 Predict conditions under which procedures are invoked Predict expression of network fragments AB C D QR S

Step 5 Infer complete program specification Formulate dynamic cellular simulation

Step 6 Internet publishing of structured program annotation with explanations, references, commentary Internet publishing of structured genome annotation with explanations, references, commentary

Step 7 Comparative analysis of viruses Evolutionary relationships among viruses Comparative analysis of genomes Evolutionary relationships among genomes

Step 8 Identify measures to disable virus or prevent its spread Identify target proteins for anti-microbial drug discovery AB C D QR S

Database of Viruses Create a database that stores l Binaries for all viruses l All annotation of virus programs by different investigators l Comparative analyses Support l Remote API access l Click-at-a-time browsing

Reference on Major Genome Databases Nucleic Acids Research Database Issue l 112 databases

Questions to Ask of a New Genome Database

What are Database Goals and Requirements? How many users? What expertise do users have? What problems will database be used to solve?

What is its Organizing Principle? Different DBs partition the space of genome information in different dimensions Experimental methods (Genbank, PDB) Organism (EcoCyc, Flybase)

What is its Level of Interpretation? Laboratory data Primary literature (Genbank) Review (SwissProt, MetaCyc) Does DB model disagreement?

What are its Semantics and Content? What entities and relationships does it model? How does its content overlap with similar DBs? How many entities of each type are present? Sparseness of attributes and statistics on attribute values

What are Sources of its Data? Potential information sources l Laboratory instruments l Scientific literature u Manual entry u Natural-language text mining l Direct submission from the scientific community u Genbank Modification policy l DB staff only l Submission of new entries by scientific community l Update access by scientific community

What DBMS is Employed? None Relational Object oriented Frame knowledge representation system

Distribution / User Access Multiple distribution forms enhance access Browsing access with visualization tools API Portability

What Validation Approaches are Employed? None Declarative consistency constraints Programmatic consistency checking Internal vs external consistency checking What types of systematic errors might DB contain?

Database Documentation Schema and its semantics Format API Data acquisition techniques Validation techniques Size of different classes Coverage of subject matter Sparseness of attributes Error rates

Relationship of Database Field to Bioinformatics Scientists generally ignorant of basic DB principles l Complex queries vs click-at-a-time access l Data model l Defined semantics for DB fields l Controlled vocabularies l Regular syntax for flatfiles l Automated consistency checking Most biologists take one programming class Evolution of typical genome database Finer points of DB research off their radar screen Handfull of DB researchers work in bioinformatics

Database Field For many years, the majority of bioinformatics DBs did not employ a DBMS l Flatfiles were the rule l Scientists want to see the data directly l Commercial DBMSs too expensive, too complex l DBAs too expensive Most scientists do not understand l Differences between BA, MS, PhD in CS l CS research vs applications l Implications for project planning, funding, bioinformatics research

Recommendation Teaching scientists programming is not enough Teaching scientists how to build a DBMS is irrelevant Teach scientists basic aspects of databases and symbolic computing l Database requirements analysis l Data models, schema design l Knowledge representation, ontologies l Formal grammars l Complex queries l Database interoperability