Automating Biology using

Slides:



Advertisements
Similar presentations
Discovery Informatics Workshop February 2-3, 2012 NSF Workshop on Discovery Informatics Vasant Honavar Program Director Information & Intelligent Systems.
Advertisements

INTRODUCTION TO AP BIOLOGY What is AP Biology  AP Biology is designed to be the equivalent of a University Introductory Biology Course  It.
Lesson Overview 1.1 What Is Science?.
Predicting essential genes via impact degree on metabolic networks ISSSB’11 Takeyuki Tamura Bioinformatics Center, Institute for Chemical Research Kyoto.
Social networks, in the form of bibliographies and citations, have long been an integral part of the scientific process. We examine how to leverage the.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Darwinian Genomics Csaba Pal Biological Research Center Szeged, Hungary.
Metabolic functions of duplicate genes in Saccharomyces cerevisiae Presented by Tony Kuepfer et al
A Systematic approach to the Large-Scale Analysis of Genotype- Phenotype correlations Paul Fisher Dr. Robert Stevens Prof. Andrew Brass.
AI and Bioinformatics From Database Mining to the Robot Scientist.
APRIL, Application of Probabilistic Inductive Logic Programming, IST Albert-Ludwigs-University, Freiburg, Germany & Imperial College of Science,
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae Article by Peter Uetz, et.al. Presented by Kerstin Obando.
Gene expression analysis summary Where are we now?
Experimental and computational assessment of conditionally essential genes in E. coli Chao WANG, Oct
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Integrated analysis of regulatory and metabolic networks reveals novel regulatory mechanisms in Saccharomyces cerevisiae Speaker: Zhu YANG 6 th step, 2006.
Systems Biology Biological Sequence Analysis
Indiana University Bloomington, IN Junguk Hur Computational Omics Lab School of Informatics Differential location analysis A novel approach to detecting.
Pathway databases Goto S, Bono H, Ogata H, Fujibuchi W, Nishioka T, Sato K, Kanehisa M. (1997) Organizing and computing metabolic pathway data in terms.
Scientific method - 1 Scientific method is a body of techniques for investigating phenomena and acquiring new knowledge, as well as for correcting and.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Nature of Science Science Nature of Science Environmental Science Outline: Outline: Science As a Way of Knowing Science As a Way of Knowing  Scientific.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Automating Drug Design Using Robot Scientists
Yeast-based automated high-throughput screens to identify anti-parasitic lead compounds by Elizabeth Bilsland, Andrew Sparkes, Kevin Williams, Harry J.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.
Genome-scale Metabolic Reconstruction and Modeling of Microbial Life Aaron Best, Biology Matthew DeJongh, Computer Science Nathan Tintle, Mathematics Hope.
SCI Scientific Inquiry The Big Picture: Science, Technology, Engineering, etc.
Automated Explanation of Gene-Gene Relationships Wacek Kuśnierczyk.
Managing Information Quality in e-Science using Semantic Web technology Alun Preece, Binling Jin, Edoardo Pignotti Department of Computing Science, University.
Abduction, Induction, and the Robot Scientist
Chapter 13. The Impact of Genomics on Antimicrobial Drug Discovery and Toxicology CBBL - Young-sik Sohn-
Functional Genomic Hypothesis Generation and Experimentation by a Robot Scientist King et al, Nature : Presented by Monica C. Sleumer February.
GTL Facilities Computing Infrastructure for 21 st Century Systems Biology Ed Uberbacher ORNL & Mike Colvin LLNL.
Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of Genes by C.H. Bryant, S.H. Muggleton, S.G. Oliver, D.B.
Science is a process Scientific inquiry is a search for information and explanation.
Helping scientists collaborate BioCAD. ©2003 All Rights Reserved.
Scientific Inquiry.
BIOINFORMATICS ON NETWORKS Nick Sahinidis University of Illinois at Urbana-Champaign Chemical and Biomolecular Engineering.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.
Top Four Essential TAIR Resources Debbie Alexander Metabolic Pathway Databases for Arabidopsis and Other Plants Peifen Zhang.
1 Departament of Bioengineering, University of California 2 Harvard Medical School Department of Genetics Metabolic Flux Balance Analysis and the in Silico.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
CHAPTER 1 – THE SCIENCE OF BIOLOGY What Is Science? (A) Organized way of using evidence to learn about the natural world. (B) Collection of knowledge that.
The Scientific Method.
Lesson Overview Lesson Overview What Is Science? Lesson Overview 1.1 What Is Science?
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
Theme 2: Data & Models One of the central processes of science is the interplay between models and data Data informs model generation and selection Models.
Bioinformatics and Computational Biology
Genome Biology and Biotechnology The next frontier: Systems biology Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity Institute.
Introduction to biological molecular networks
Chapter 1.1 – What is Science?. State and explain the goals of science. Describe the steps used in the scientific method. Daily Objectives.
4/25/2017 Lesson 1-1 Nature of Science.
Opportunities for Text Mining in Bioinformatics (CS591-CXZ Text Data Mining Seminar) Dec. 8, 2004 ChengXiang Zhai Department of Computer Science University.
Science is an organized way of gathering and analyzing evidence about the natural world. - a way of thinking, observing, and “knowing” - explanations.
Lesson Overview Lesson Overview What Is Science? Lesson Overview 1.1 What Is Science?
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
1 Genomics Advances in 1990 ’ s Gene –Expressed sequence tag (EST) –Sequence database Information –Public accessible –Browser-based, user-friendly bioinformatics.
Scientific Method 1.Observe 2.Ask a question 3.Form a hypothesis 4.Test hypothesis (experiment) 5.Record and analyze data 6.Form a conclusion 7.Repeat.
What is Science?? Biology IA Spring Goals of Science To investigate and understand the natural world To explain events in the natural world Use.
Chapter 1: Section 1 What is Science?. What Science IS and IS NOT.. The goal of Science is to investigate and understand the natural world, to explain.
Intro to Biology. The goal of science is to: investigate and understand the natural world. investigate and understand the natural world. explain events.
High throughput biology data management and data intensive computing drivers George Michaels.
CHAPTER 1 – THE SCIENCE OF BIOLOGY What Is Science? (A) Organized way of using evidence to learn about the natural world. (B) Collection of knowledge that.
TDM in the Life Sciences Application to Drug Repositioning *
Biology and You.
Biological Science Applications in Agriculture
Presentation transcript:

Automating Biology using 4/16/2017 Automating Biology using Robot Scientists Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

4/16/2017 Automating Biology

The Concept of a Robot Scientist 4/16/2017 We have developed the first computer system that is capable of originating its own experiments, physically doing them, interpreting the results, and then repeating the cycle. Background Knowledge Hypothesis Formation Analysis Consistent Hypotheses Experiment Results Interpretation Experiment selection Final Theory Robot

Motivation: Philosophical 4/16/2017 Motivation: Philosophical What is Science? The question whether it is possible to automate the scientific discovery process seems to me central to understanding science. There is a strong philosophical position which holds that we do not fully understand a phenomenon unless we can make a machine which reproduces it. “What I cannot create, I do not understand” (Richard Feynman).

Motivation: AI Science is an excellent test bed for AI Science v Chess 4/16/2017 Motivation: AI Science is an excellent test bed for AI Science v Chess abstract world: 64 squares, 36 pieces. Computers play chess as well or better than the best humans, and computers can now make strikingly beautiful moves. Science v General Intelligence Abstraction Nature is honest

Motivation: Technological 4/16/2017 Motivation: Technological In many areas of science our ability to generate data is outstripping our ability to analyse it. One scientific area where this is true is Systems Biology, where data is now being generated on an industrial scale. The analysis of scientific data needs to become as industrialised as its generation.

Technological Advantages 4/16/2017 Technological Advantages Robot Scientists have the potential to increase the productivity of science by enabling the high-throughput testing of hypotheses. Robot Scientists have the potential to improve the quality of science by enabling the description of experiments in greater detail and semantic clarity.

4/16/2017 Scientific Discovery Meta-Dendral: Analyis of mass-spectrometry data. Feigenbaum, Djerassi, Lederburg (1969). Bacon: Rediscovering physics and chemistry. Langley, Bradshaw, Simon (1979). Automated discovery in a chemistry laboratory. Zytkow, Zhu, Hussman (1990).

Robot Scientist Timeline 4/16/2017 Robot Scientist Timeline 1999-2004 Initial Robot Scientist Project Limited Hardware Collaboration with Douglas Kell (Aber Biology), Steve Oliver (Manchester), Stephen Muggleton (Imperial) King et al. (2004) Nature, 427, 247-252 2004-2011 Adam Project Sophisticated Laboratory Automation Collaboration with Steve Oliver (Cambridge). King et al. (2009) Science, 324, 85-89 2008-2011 Eve Project

4/16/2017 Adam

The Experimental Cycle 4/16/2017 The Experimental Cycle Background Knowledge Hypothesis Formation Analysis Consistent Hypotheses Experiments(s) Experiment(s) selection Final Theory Robot Results Interpretation

The Application Domain 4/16/2017 The Application Domain Functional genomics In yeast (S. cerevisiae) ~15% of the 6,000 genes still have no known function. EUROFAN 2 made all viable single deletant strains. Task to determine the “function” of a gene by growth experiments.

4/16/2017 Logical Cell Model We have developed a logical formalism for modelling metabolic pathways (encoded in Prolog). This is essentially a directed labeled hyper-graph: with metabolites as nodes and enzymes as arcs. If a path can be found from cell inputs (metabolites in the growth medium) to all the cell outputs (essential compounds) then the cell can grow.

4/16/2017 ß

Genome Scale Model of Yeast Metabolism 4/16/2017 Genome Scale Model of Yeast Metabolism It covers most of what is known about yeast metabolism. Includes 1,166 ORFs (940 known, 226 inferred). Growth if path from growth medium to defined end-points. State-of-the-art accuracy in predicting cell viability. Now integrated with Yeast 4.0.

The Experimental Cycle 4/16/2017 The Experimental Cycle Background Knowledge Hypothesis Formation Analysis Consistent Hypotheses Experiments(s) Experiment(s) selection Final Theory Robot Results Interpretation

Locally Orphan Enzymes 4/16/2017 Locally Orphan Enzymes Adam’s model of yeast metabolism has “locally orphan enzymes” these catalyse biochemical reactions known to be in yeast, but for which the coding genes are unknown. Adam uses bioinformatic methods to abduce genes which could encode these orphan enzymes - hypotheses.

Automated Model Completion 4/16/2017 Automated Model Completion Model of Metabolism Experiment Formation Hypothesis Formation REACTION Bioinformatics Database ? Experiment * Explain experimental cycle + FASTA32 PSI-BLAST Gene Identification Deduction orthologous(Gene1, Gene2) → similar_sequence(Gene1, Gene2). Abduction similar_sequence(Gene1, Gene2) → orthologous(Gene1, Gene2).

4/16/2017 ß ?

The Experimental Cycle 4/16/2017 The Experimental Cycle Background Knowledge Hypothesis Formation Analysis Consistent Hypotheses Experiments(s) Experiment(s) selection Final Theory Robot Results Interpretation

The Form of the Experiments 4/16/2017 The Form of the Experiments Hypothesis 1: Gene YER152C codes for the enzyme the reaction: chorismate  prephenate. Hypothesis 2: Gene YGL060W codes for the enzyme the reaction: chorismate  prephenate. These can be tested by: Growing YER152C in environment +/- prephenate. Growing YGL060W in environment +/- prephenate.

4/16/2017 ß ?

The Experimental Cycle 4/16/2017 The Experimental Cycle Background Knowledge Hypothesis Formation Analysis Consistent Hypotheses Experiments(s) Experiment(s) selection Final Theory Robot Results Interpretation

4/16/2017 Adam One of the most sophisticated pieces of laboratory automation in the world. Designed to fully automate yeast growth experiments. Has a -20C freezer, 3 incubators, 2 readers, 3 liquid handlers, 3 robotic arms, 2 robot tracks, a centrifuge, a washer, an environmental control system, etc. Is capable of initiating ~1,000 new experiments and >200,000 observations per day in a continuous cycle.

4/16/2017 LIMS Setup

4/16/2017 Diagram of Adam

4/16/2017 Adam

4/16/2017 Adam in Action

4/16/2017

4/16/2017 Growth plates

4/16/2017 Example Growth Curves Arginine as N source

4/16/2017 Growth curves OD OD Time Time OD OD Time Time

The Experimental Cycle 4/16/2017 The Experimental Cycle Background Knowledge Hypothesis Formation Analysis Consistent Hypotheses Experiments(s) Experiment(s) selection Final Theory Robot Results Interpretation

Qualitative to Quantitative 4/16/2017 Qualitative to Quantitative The functions of most genes in S. cerevisiae that when deleted result in auxotrophy (no growth) have already been discovered. Most genes of unknown function only affect growth quantitatively. They may have slower growth (bradytrophs), faster growth, higher/lower biomass yield, etc.

The Experimental Cycle 4/16/2017 The Experimental Cycle Background Knowledge Hypothesis Formation Analysis Consistent Hypotheses Experiments(s) Experiment(s) selection Final Theory Robot Results Interpretation

Discovery of Novel Science 4/16/2017 Discovery of Novel Science

4/16/2017 Novel Science Adam generated and confirmed twelve novel functional-genomics hypotheses concerning the identify of genes encoding enzymes catalysing orphan reactions in the metabolic network of the yeast S. cerevisiae. Adam's conclusions have been manually verified using bioinformatic and biochemical evidence.

Novel Scientific Knowledge 4/16/2017 Novel Scientific Knowledge

4/16/2017 A 50 Year Old Puzzle The enzyme 2-aminoadipate: 2-oxoglutarate aminotransferase was a locally orphan enzyme. It is in the lysine biosynthesis pathway which has been studied for 50 years in fungi: target for antibiotics, and on path to penicillin. Adam formed three hypotheses for the gene to encode this enzyme: YER152C, YJL060W, and YGL202W (in that order of probability). Currently KEGG states that YGL202W is the gene. Evidence from 1960’s that at least 2 isoenzymes are involved.

Confirmed New Knowledge 4/16/2017 Adam’s differential growth experiments were consistent with all three genes encoding 2-oxoglutarate aminotransferase. Manual experiments: purified protein + enzyme assays are consistent. YGL202W literature confirmed. YJL060W (was annotated as an arylformamidase, new (08) annotation kynurenine aminotransferase) YER152C (currently not annotated) YGL202W & YJL060W double knockout is lethal

4/16/2017 Adam’s current work

Learning Systems Biology 4/16/2017 Learning Systems Biology The goal is to get Adam to automatically improve its state-of-the-art FBA model by designing new experiments. The experiments are “perturbation” growth experiment remove gene(s) add metabolite. Combines ideas from logical and FBA modelling

Overview

4/16/2017 Formalising Biology

Formalization of Science 4/16/2017 Formalization of Science The goal of science is to increase our knowledge of the natural world through the performance of experiments. This knowledge should, ideally, be expressed in a formal logical language. Formal languages promote semantic clarity, which in turn supports the free exchange of scientific knowledge and simplifies scientific reasoning.

Robot Scientist & Formalisation 4/16/2017 Robot Scientist & Formalisation Robot Scientists provide excellent test-beds for the development of methodologies for formalising science. Using them it is possible to completely capture and digitally curate all aspects of the scientific process. The ontology LABORS is designed to enable the open access of the Robot Scientist experimental data and metadata to the scientific community. Soldatova et al. (2006) Bioinformatics

Adam’s Investigations 4/16/2017 Adam’s Investigations This formalisation involves >10,000 different research units in a nested tree-like structure 11 levels deep. It logically connects >6.6 million OD600nm measurements to hypotheses, experimental goals, results, etc. No previous large-scale experimental work has been so comprehensively described and recorded.

4/16/2017

Levels in the Formalisation 4/16/2017 Levels in the Formalisation Investigation into the automation of Science Investigation into the automation of novel science Investigation into the automated discovery of genes encoding orphan enzymes Automated study of E.C.2.6.1.39 encoding Cycle 1 of automated study of YER152C function YER152C and Lysine automated trial Experiment 1 (wild-type no metabolite) Replicate 1 (well) Observation 1

automated study of yer152c function 4/16/2017 automated study: automated study of yer152c_function has domain of study: functional genomics has investigator = robot scientist Adam has goal: 'To test the hypothesis that gene YER152C encodes an enzyme with enzyme class E.C.2.6.1.39'. has organism of study: Saccharomyces Cerevisiae has ncbi taxonomy ID: 4932 has hypotheses-set: has research hypothesis 1: encodes(yer152c,ec_2_6_1_39) has negative hypothesis 2: not encodes(yer152c,ec_2_6_1_39) has cycle 1 of study has study result: the strength of evidence that encodes(yer152c,ec_2_6_1_39): highest accuracy of random forest evidence: 74% proportion of random forest evidence >=70%: 2/3 has study conclusion: hypothesis 1 confirmed has text representation: a:automated study(X) :- a:automated_study_of_yer152c_function. a:hypotheses-set(X) :- a:research_hypothesis(X). a:cycle_of_study(X) :- a:cycle_1_of_study_(X). a:hypotheses-set(X) :- a:negative_hypothesis(X). a:domain_of_study(Y) :- a: automated study(X), a:has_ domain_of_study(X,Y). a:investigator(Y) :- a: automated study(X), a:has_ investigator(X,Y). a:goal(Y) :- a: automated study(X), a:has_goal(X,Y). a:organism_of_study (Y) :- a: automated study(X), a:has_organism_of_study(X,Y). a:hypotheses-set(Y) :- a: automated study(X), a:has_hypotheses-set(X,Y). a:cycle_of_study(Y) :- a: automated study(X), a:has_cycle_of_study(X,Y). a:study_result(Y) :- a: automated study(X), a:has_study_result(X,Y). a:study_conclusion(Y) :- a: automated study(X), a:has_study_conclusion(X,Y). a:domain_of_study(X) :- a:functional_genomics. a: investigator(X) :- a:adam. a:goal(X) :- a: to_test_the_hypothesis_that_gene_YER152C _encodes_an_enzyme_with_enzyme_class_E_C_2_6_1_39. a:organism_of_study(X) :- a:saccharomyces_cerevisiae. a:study_result(X) :- a:the_strength_of_evidence_of_hyypothesis_1. a:study_conclusion(X) :- a:hypothesis_1_confirmed. has datalog representation: <?xml version="1.0"?> <rdf:RDF xmlns="http://www.owl-ontologies.com/Ontology1204198571.owl#" <owl:Class rdf:ID="goal"/> <owl:Class rdf:ID="study_result"/> <owl:Class rdf:ID="ncbi_taxonomy_ID"/> <owl:Class rdf:ID="cycle_of_study"/> <owl:Class rdf:ID="negative_hypothesis"> <rdfs:subClassOf> <owl:Class rdf:ID="hypotheses-set"/> </rdfs:subClassOf> </owl:Class> <owl:Class rdf:ID="domain_of_study"/> <owl:Class rdf:ID="organism_of_study"/> <owl:Class rdf:ID="cycle_1_of_study_"> <rdfs:subClassOf rdf:resource="#cycle_of_study"/> <owl:Class rdf:ID="automated_study"> <owl:Restriction> <owl:someValuesFrom rdf:resource="#goal"/> <owl:onProperty> <owl:ObjectProperty rdf:ID="has_goal"/> </owl:onProperty> </owl:Restriction> <owl:someValuesFrom rdf:resource="#organism_of_study"/> <owl:ObjectProperty rdf:ID="has_organism_of_study"/> ………………………………………………………. has OWL representation:

Conclusions 4/16/2017 Automation is becoming increasingly important in scientific research e.g. DNA sequencing, drug design. The Robot Scientist concept represents the logical next step in scientific automation. The Robot Scientist Adam is the first machine to have discovered novel scientific knowledge. Scientific knowledge should be formalised to ensure the comprehensibility, reproducibility, and free exchange of knowledge with human and robot scientists.

Acknowledgments ABERYSTWYTH CAMBRIDGE Wayne Aubrey Elizabeth Bilsland 4/16/2017 Acknowledgments ABERYSTWYTH Wayne Aubrey Amanda Clare Douglas Kell Maria Liakata Chuan Lu Magda Markham Katherine Martin Ronald Pateman Jem Rowland Andrew Sparkes Larisa Soldatova Mike Young Ken Whelan CAMBRIDGE Elizabeth Bilsland Steve Oliver Pınar Pir