Flexible genome retrieval for supporting in-silico studies of endobacteria-AMFs S. Montani 1, G. Leonardi 1, S. Ghignone 2, L. Lanfranco 2 1 Dipartimento.

Slides:



Advertisements
Similar presentations
Introductory to database handling Endre Sebestyén.
Advertisements

Flexible and efficient retrieval of haemodialysis time series S. Montani, G. Leonardi, A. Bottrighi, L. Portinale, P. Terenziani DISIT, Sezione di Informatica,
A Case-based Approach to Business Process Monitoring S. Montani 1, G. Leonardi 1 1 Dipartimento di Informatica, University of Piemonte Orientale, Alessandria,
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
A Lite Introduction to (Bioinformatics and) Comparative Genomics Chris Mueller August 10, 2004.
Ontology annotation: mapping genomic regions biological function Paul D Thomas, Huaiyu Mi and Suzanna Lewis.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
Interoperation of Molecular Biology Databases Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International Menlo Park, CA
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Pathways Database System: An Integrated System For Biological Pathways L. Krishnamurthy, J. Nadeau, G. Ozsoyoglu, M. Ozsoyoglu, G. Schaeffer, M. Tasan.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Overview of Bioinformatics A/P Shoba Ranganathan Justin Choo National University of Singapore A Tutorial on Bioinformatics.
Genome Informatics 2005 ~ 220 participants 1 keynote speaker: David Haussler 47 talks 121 posters.
Bioinformatics Timothy Ketcham Union College Gradutate Seminar 2003 Bioinformatics.
Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Title: GeneWiz browser: An Interactive Tool for Visualizing Sequenced Chromosomes By Peter F. Hallin, Hans-Henrik Stærfeldt, Eva Rotenberg, Tim T. Binnewies,
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Comparative Genomics Tools in GMOD GMOD.org Dave Clements 1, Sheldon McKay 2, Ken Youns-Clark 2, Ben Faga 3, Scott Cain 4, and the GMOD Consortium 1 National.
Mobile Agent Technology for the Management of Distributed Systems - a Case Study Claudia Raibulet& Claudio Demartini Politecnico di Torino, Dipartimento.
GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.
BLAST: A Case Study Lecture 25. BLAST: Introduction The Basic Local Alignment Search Tool, BLAST, is a fast approach to finding similar strings of characters.
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.
BioHealthBase: A Web-based Database and Analysis Resource for Francisella Shubhada Godbole 1, Jyothi Noronha 1, Burke Squires 1, Victoria Hunt 1, Ed Klem.
GMOD: Managing Genomic Data from Emerging Model Organisms Dave Clements 1, Hilmar Lapp 1, Brian Osborne 2, Todd J. Vision 1 1 National Evolutionary Synthesis.
Got genom e? Community Meetings GMOD.org The GMOD community meets semi- annually to discuss GMOD components, best practices,
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
Gramene Objectives Provide researchers working on grasses and plants in general with a bird’s eye view of the grass genomes and their organization. Work.
Digesting the Genome Glut Promoting the Use and Extension of GMOD To Emerging Model Organisms David Clements 1 Brian Osborne 2 Hilmar Lapp 1 Xianhua Liu.
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
BIOLOGICAL DATABASES. BIOLOGICAL DATA Bioinformatics is the science of Storing, Extracting, Organizing, Analyzing, and Interpreting information in biological.
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
EB3233 Bioinformatics Introduction to Bioinformatics.
Copyright OpenHelix. No use or reproduction without express written consent1.
I. Prolinks: a database of protein functional linkage derived from coevolution II. STRING: known and predicted protein-protein associations, integrated.
Bioinformatics and Computational Biology
By Chris Paine Genes Essential idea: Every living organism inherits a blueprint for life from its parents. Genes and.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
….. The cloud The cluster…... What is “the cloud”? 1.Many computers “in the sky” 2.A service “in the sky” 3.Sometimes #1 and #2.
NCBI: something old, something new. What is NCBI? Create automated systems for knowledge about molecular biology, biochemistry, and genetics. Perform.
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Graduate Research with Bioinformatics Research Mentors Nancy Warter-Perez, ECE Robert Vellanoweth Chem and Biochem Fellow Sean Caonguyen 8/20/08.
BLAST: Basic Local Alignment Search Tool Robert (R.J.) Sperazza BLAST is a software used to analyze genetic information It can identify existing genes.
Behavior and Phenotype in GMOD Natural Diversity in GMOD
Basics of Comparative Genomics
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
New genes can be added to an organism’s DNA.
Scientists use several techniques to manipulate DNA.
Genomes and Their Evolution
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
got genome? Community Meetings Databases Training GMOD.org
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
for the Cotton Community
3.1 Genes Essential idea: Every living organism inherits a blueprint for life from its parents. Genes and hence genetic information is inherited from.
Basics of Comparative Genomics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

Flexible genome retrieval for supporting in-silico studies of endobacteria-AMFs S. Montani 1, G. Leonardi 1, S. Ghignone 2, L. Lanfranco 2 1 Dipartimento di Informatica, University of Piemonte Orientale, Alessandria, Italy 2 Dipartimento di Biologia Vegetale, University of Turin, Italy

Arbuscular mycorrhizal fungi (AMFs)‏ Obligate symbionts in strict association with roots of land plants In soil: positive impacts on plants health and productivity Often in further symbiosis with bacteria Tripartite system: (i) endobacterium (ii) AMF (iii) plant roots AMF Spore AMF Hypha Endobacteria

Studying the tripartite system Potentially strong practical impacts symbiotic consortia may lead to: new metabolic pathways appearance of interesting molecules for sustainable agriculture and (possibly) for industrial biotechnological applications Comparative genomics approach to infer phylogenetic relationships genome evolution metabolic functions of a given organism (also with few available data)‏ Key part of the study: genomic data of the endobacteria and AMF-endobacteria interaction

A computational environment for AMF-endobacteria interaction Genomic study of the system AMF Gigaspora margarita (isolate BEG34) and of its endobacterium Candidatus Glomeribacter gigasporarum BIOBITS project, Regione Piemonte - Converging Technologies Modular architecture Database Synteny and visualization tools BIOBITS research tools Generic Model Organism Database (GMOD) project: open source tools for creating and managing genome-scale biological databases

Architecture of the system Flexible retrieval

Data storage CHADO DB Bacterial genomes, known annotations, proteins and metabolic pathways, and newly discovered annotations Manually loaded with genomes of Candidatus Glomeribacter’s relatives Import modules and RRE - Queries information retrieved from the biological databases accessible through the Internet (e.g. GenBank)

Data visualization GMOD customizable modules for comparative genomics CMap allows to view comparisons of genetic and physical maps GBrowse_syn is a synteny browser to display multiple genomes, with a central reference species SyBil is a system for comparative genomics visualizations

New applications (BIOBITS research tools)‏ Biomart-based tools reorganizes the information into a data warehouse analyzes the data by means of clustering and data mining techniques Flexible retrieval tool Case-based reasoning paradigm

Case-based retrieval retrieve past cases similar to the current one reuse past successful solutions after, if necessary, properly revising them retain the current case

Case representation Sequence of nucleotides, properly aligned with the same reference organism Percentage of similarity with the aligned nucleotide in the reference organism

Case representation

Flexible retrieval Abstracting the data at different levels in a taxonomy “Bird’s eye” view of similarity Example: DCW region (cellular division) About 10 genes Region conserved in relatives a single gene may not

Flexible retrieval Abstracting the data at different “states” granularity levels Similar to the (state) Temporal Abstraction technique: from points to intervals sharing a common persistent behavior Each state specialized in further subdivisions

Efficient retrieval Multi-dimensional index structures Queries at any level of detail Interactivity

Query answering Query: similarity string at any detail level (Hv..Hv) Query generalization to find index root Hv..Hv -> H..H -> H Index navigation backwards respect to query generalization steps

Computation time Efficient retrieval particularly critical in very large databases (bacteria genome DBs growing very fast) Existing implementation in the haemodialysis domain 1475 real haemodialysis patients cases Fast index-based TA is (41 msec on Intel Core 2 Duo T9400 processor running at 2.53 GHz, equipped with 4 Gb of DDR2 ram)

Conclusions Modular architecture for in-silico comparative genomics studies of AMF-endobacteria interaction Flexible genome retrieval tool Flexible query definition, at different levels of abstractions Efficient index-based retrieval Interactive query refinement/generalization

Future work Complete tool implementation Experiments on RefSeq NCBI data Tool usability New applications published as new GMOD modules