BioRuby and the KEGG API Toshiaki Katayama Bioinformatics center, Kyoto U., Japan Toshiaki Katayama Bioinformatics center,

Slides:



Advertisements
Similar presentations
BioRuby + KEGG API + KEGG DAS = wiring knowledge for genome and pathway Toshiaki Katayama Human Genome Center, University of Tokyo, Japan
Advertisements

INTRODUCTION TO BIOPERL Gautier Sarah & Gaëtan Droc.
BioJava in 2002 An Open-Source Java Library for Bioinformatics (Matthew Pocock, BioJava Consulting LTD)
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
BioRuby.project("introduction") Toshiaki Katayama bioruby.org/ Bioinformatics Center, Kyoto University, JAPAN.
On line (DNA and amino acid) Sequence Information Lecture 7.
Doug Raiford Lesson 13 5/10/20151Gene networks and pathways.
HCS806 “Methods in Horticulture and Crop Science” Introduction to methods in Bioinformatics for plant science. David Francis (Coordinator) Ian Holford.
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
APAN e-Science Workshop e-Bio System for Bio- Knowledge Discovery Sangsoo Kim Nat’l Genome Informat’n Ct. Korea Res. Inst. of Biosci. & Biotech.
KEGG: Kyoto Encyclopedia of Genes and Genomes Susan Seo Intro to Bioinformatics Fall 2004.
Archives and Information Retrieval
Biological databases.
Essential Bioinformatics and Biocomputing (LSM2104: Section I) Biological Databases and Bioinformatics Software Prof. Chen Yu Zong Tel:
The Cell, Central Dogma and Human Genome Project.
1 Databases in Bioinformatics (Roald Forsberg). 2 Overview The role of databases in bioinformatics The structure of databases –Relational databases –Database.
Bioperl modules.
Signaling Pathways and Summary June 30, 2005 Signaling lecture Course summary Tomorrow Next Week Friday, 7/8/05 Morning presentation of writing assignments.
Bioinformatics Lecture 3 BCH 550 Arjumand Warsy. Retrieving Protein Sequences.
Application of Bioinformatics in Genetics Research Instructors: Dr. Henry Baker Dr. Luciano Brocchieri Drs. Michele Tennant / & Rolando Milian Dr. Lei.
An Introduction to Bioinformatics Molecular Biology Databases.
BioPerl. cpan Open a terminal and type /bin/su - start "cpan", accept all defaults install Bio::Graphics.
Course Module: Introduction to Bioinformatics – CS 2001 July CS Databases.
>>> Korean BioInformation Center >>> KRIBB Korea Research institute of Bioscience and Biotechnology GS2PATH: Linking Gene Ontology and Pathways Jin Ok.
Ch10. Intermolecular Interactions and Biological Pathways
BioPerl - documentation Bioperl tutorial tutorial Mastering Perl for Bioinformatics: Introduction.
ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Advancing Science with DNA Sequence Data Curation in IMG-ER Natalia Ivanova MGM Workshop May 16, 2012.
Bioinformatics for biomedicine
BioPython Workshop Gershon Celniker Tel Aviv University.
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
Web-based/Open-source Tools for Bioinformatics and Genome Analysis.
Biological Databases By : Lim Yun Ping E mail :
Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers.
Supporting High- Performance Data Processing on Flat-Files Xuan Zhang Gagan Agrawal Ohio State University.
Sequence Retrieving, Manipulation and Management BIOINFORMATICS Lecture 3.
1 Review of Biological Database Utilization. 2 Biological Databases We will discuss: Usefulness to the bioinformaticist Database types Search methods.
Part I: Identifying sequences with … Speaker : S. Gaj Date
UMR ASP UMR ASP Structural & Comparative Genomics in Bread Wheat TriAnnotPipeline A LifeGrid Project based on AUVERGRID F. Giacomoni, M.
Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
BIOINFORMATIK I UEBUNG 2 mRNA processing.
BLOCKS Multiply aligned ungapped segments corresponding to most highly conserved regions of proteins- represented in profile.
Savita Shrivastava Feb 25 th, 2005 Lab Presentation BASys A Web Server for Automated Bacterial Annotation.
EMBOSS over a Grid 1. 1st EELA Grid School December 4th of 2006 Eduardo MURRIETA LEON Romualdo ZAYAS-LAGUNAS Pierre-Alain BRANGER Jérôme VERLEYEN Roberto.
Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology
BioRuby 2005 Toshiaki Katayama Human Genome Center,University of Tokyo, Japan Toshiaki Katayama Human Genome Center,University.
BioPerl Ketan Mane SLIS, IU. BioPerl Perl and now BioPerl -- Why ??? Availability Advantages for Bioinformatics.
Cool BaRC Web Tools Prat Thiru. BaRC Web Tools We have.
March 28, 2002 NIH Proteomics Workshop Bethesda, MD Lai-Su Yeh, Ph.D. Protein Scientist, National Biomedical Research Foundation Demo: Protein Information.
Copyright OpenHelix. No use or reproduction without express written consent1 1.
Practice – file types (Cont.) Load the “Mysequence.doc” file to Webcutter using “Choose file” and then “Upload sequence file”. -Notice that the “sequence”
Importing KEGG pathway and mapping custom node graphics on Cytoscape Kozo Nishida Keiichiro Ono Cytoscape retreat 2010 University of Michigan Jul 18, 2010.
Advanced Perl For Bioinformatics Part 1 2/23/06 1-4pm Module structure Module path Module export Object oriented programming Part 2 2/24/06 1-4pm Bioperl.
Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
GeneConnect Use Cases and Design August 3, GeneConnect Database IDs are linked by Direct Annotation, Inferred Annotation, or Sequence Alignment.
MARC: Developing Bioinformatics Programs Alex Ropelewski PSC-NRBSC Bienvenido Vélez UPR Mayaguez Essential BioPython: Overview 1.
Demo: Protein Information Resource
생물정보학 Bioinformatics.
Mangaldai College, Mangaldai
Welcome to the Protein Database Tutorial
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Comparative Genomics.
Lesson 3 Bioinformatics Laboratory
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Supporting High-Performance Data Processing on Flat-Files
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Overview of Enzyme, Protein and Network Databases
Presentation transcript:

BioRuby and the KEGG API Toshiaki Katayama Bioinformatics center, Kyoto U., Japan Toshiaki Katayama Bioinformatics center, Kyoto U., Japan

Use the source!

What is BioRuby?  Yet another BioPerl written in Ruby  since Nov 2000  Developed in Japan  includes support for Japanese resources like KEGG  Yet another BioPerl written in Ruby  since Nov 2000  Developed in Japan  includes support for Japanese resources like KEGG

So, what is Ruby?  Scripting language  clean syntax, v. easy to use/read/write  Everything is object  Integer, String, Regexp, Exception etc. w/o exception  Created by Japanese author 'matz'  Scripting language  clean syntax, v. easy to use/read/write  Everything is object  Integer, String, Regexp, Exception etc. w/o exception  Created by Japanese author 'matz'

So, what is Ruby? Pre-installed in Mac OS X !

What the Ruby code looks like? #!/usr/bin/ruby puts "Hello world!" #!/usr/bin/ruby puts "Hello world!"

What the BioRuby code looks like? #!/usr/bin/env ruby require 'bio' gene = Bio::Seq::NA.new("catgaattattgtagannntgataaagacttgac") prot = gene.translate # => "HELL*XW*RLD" (Bio::Seq::AA object) puts prot.split('X').join(' ').capitalize.gsub(/\*/, 'o') << '!' # => "Hello World!" #!/usr/bin/env ruby require 'bio' gene = Bio::Seq::NA.new("catgaattattgtagannntgataaagacttgac") prot = gene.translate # => "HELL*XW*RLD" (Bio::Seq::AA object) puts prot.split('X').join(' ').capitalize.gsub(/\*/, 'o') << '!' # => "Hello World!"

Bio::Sequence  Bio::Sequence (aliased to Bio::Seq)  seq.composition  seq.window_search  seq.randomize  seq.to_fasta  Bio::Sequence::NA  seq.complement  seq.splicing  seq.translate  seq.to_re  Bio::Sequence::AA  seq.molecular_weight  Bio::Sequence (aliased to Bio::Seq)  seq.composition  seq.window_search  seq.randomize  seq.to_fasta  Bio::Sequence::NA  seq.complement  seq.splicing  seq.translate  seq.to_re  Bio::Sequence::AA  seq.molecular_weight #!/usr/bin/env ruby require 'bio' seq = Bio::Seq::NA.new("atggcttcagt.. seq.window_search(15, 3) do |sub| puts sub.translate.molecular_weight end #!/usr/bin/env ruby require 'bio' seq = Bio::Seq::NA.new("atggcttcagt.. seq.window_search(15, 3) do |sub| puts sub.translate.molecular_weight end

What BioRuby can do more?  Biological sequence manipulations  Run applications (Blast etc.) and parse its report  Database entry retrieval, parsing  PubMed reference DB search, reformatting  Accessing OBDA, DAS, KEGG API  Biological sequence manipulations  Run applications (Blast etc.) and parse its report  Database entry retrieval, parsing  PubMed reference DB search, reformatting  Accessing OBDA, DAS, KEGG API

Bio::Applications  Bio::Blast, Fasta, HMMER, EMBOSS  Bio::Genscan  Bio::PSORT, TargetP  Bio::SOSUI, TMHMM  each contains Report class for parsing the results  Bio::Blast, Fasta, HMMER, EMBOSS  Bio::Genscan  Bio::PSORT, TargetP  Bio::SOSUI, TMHMM  each contains Report class for parsing the results

example #!/usr/bin/env ruby require 'bio' File.open("my_blast_output.xml") do |file| Bio::Blast.reports(file) do |report| report.hits do |hit| puts hit.num, hit.query_id, hit.target_id, hit.bit_score, hit.evalue, hit.overlap end #!/usr/bin/env ruby require 'bio' File.open("my_blast_output.xml") do |file| Bio::Blast.reports(file) do |report| report.hits do |hit| puts hit.num, hit.query_id, hit.target_id, hit.bit_score, hit.evalue, hit.overlap end

Bio::DB  Bio::GenBank, EMBL, SwissProt,...  Bio::FastaFormat, GFF, GO  Bio::MEDLINE, LITDB  Bio::TRANSFAC, PROSITE  Bio::FANTOM  Bio::KEGG::GENES, ENZYME, Microarray,...  Bio::AAindex  Bio::GenBank, EMBL, SwissProt,...  Bio::FastaFormat, GFF, GO  Bio::MEDLINE, LITDB  Bio::TRANSFAC, PROSITE  Bio::FANTOM  Bio::KEGG::GENES, ENZYME, Microarray,...  Bio::AAindex

OBDA -- Open Bio* DB Access  Bio::Registry  Bio::Fetch  Bio::FlatFile  Bio::SQL +  Bio::DAS  Bio::Registry  Bio::Fetch  Bio::FlatFile  Bio::SQL +  Bio::DAS

OBDA example  ~/.bioinformatics/seqdatabase.ini [embl] protocol=biofetch location= dbname=embl [swissprot] protocol=biosql location=db.bioruby.org dbname=biosql driver=mysql biodbname=sp [genbank] protcol=flat location=/export/database/ dbname=genbank  ~/.bioinformatics/seqdatabase.ini [embl] protocol=biofetch location= dbname=embl [swissprot] protocol=biosql location=db.bioruby.org dbname=biosql driver=mysql biodbname=sp [genbank] protcol=flat location=/export/database/ dbname=genbank #!/usr/bin/env ruby require 'bio' reg = Bio::Registry.new sp = reg.get_database('swissprot') puts sp.get_by_id('CYC_BOVIN') gb = reg.get_database('genbank') puts gb.get_by_id('AA2CG') #!/usr/bin/env ruby require 'bio' reg = Bio::Registry.new sp = reg.get_database('swissprot') puts sp.get_by_id('CYC_BOVIN') gb = reg.get_database('genbank') puts gb.get_by_id('AA2CG')

What is KEGG?  GENOME, GENES  PATHWAY  SSDB etc.  GENOME, GENES  PATHWAY  SSDB etc.

KEGG in GMOD   Import KEGG/GENES and KEGG/PATHWAY into GMOD browser (converted by BioRuby)  Over 100 organisms in unified form  linked to GENES and PATHWAY database   Import KEGG/GENES and KEGG/PATHWAY into GMOD browser (converted by BioRuby)  Over 100 organisms in unified form  linked to GENES and PATHWAY database

KEGG API   SOAP/WSDL based web service  XML, HTTP  Proteome and pathway analysis  KEGG/GENES  KEGG/SSDB  KEGG/PATHWAY   SOAP/WSDL based web service  XML, HTTP  Proteome and pathway analysis  KEGG/GENES  KEGG/SSDB  KEGG/PATHWAY

example #!/usr/bin/env ruby require 'bio' serv = Bio::KEGG::API.new puts serv.get_best_neighbors_by_gene('eco:b0002') # serv.get_best_neighbors_by_gene('eco:b0002', 500) # serv.get_best_neighbors_by_gene('eco:b0002', 500, ['hin', 'syn']) list = ['ec: ', 'ec: '] puts serv.get_genes_by_enzymes(list, ['eco', 'syn']) list = ['eco:b0600', 'eco:b1190'] puts serv.mark_all_pathways_by_genes('eco', list) #!/usr/bin/env ruby require 'bio' serv = Bio::KEGG::API.new puts serv.get_best_neighbors_by_gene('eco:b0002') # serv.get_best_neighbors_by_gene('eco:b0002', 500) # serv.get_best_neighbors_by_gene('eco:b0002', 500, ['hin', 'syn']) list = ['ec: ', 'ec: '] puts serv.get_genes_by_enzymes(list, ['eco', 'syn']) list = ['eco:b0600', 'eco:b1190'] puts serv.mark_all_pathways_by_genes('eco', list) ISMB poster D-25

Example : Ortholog cluster and common motif #!/usr/bin/ruby require 'bio' serv = Bio::KEGG::API.new homologs = serv.get_all_neighbors_by_gene('hsa:7368', 500) pfams = {} homologs.each do |g| motifs = serv.get_common_motifs_by_genes(g[0]) motifs.each do |m| if m[0] =~ /pf:/ if pfams.key?(m) pfams[m] << g[0] else pfams[m] = [g[0]] end pfams.each do |m| p m end #!/usr/bin/ruby require 'bio' serv = Bio::KEGG::API.new homologs = serv.get_all_neighbors_by_gene('hsa:7368', 500) pfams = {} homologs.each do |g| motifs = serv.get_common_motifs_by_genes(g[0]) motifs.each do |m| if m[0] =~ /pf:/ if pfams.key?(m) pfams[m] << g[0] else pfams[m] = [g[0]] end pfams.each do |m| p m end

Example: Gene expression and Pathway analysis by KEGG API serv = Bio::KEGG::API.new regulated_genes.each do |gene| map_id = serv.get_pathways_by_genes(["bsu:#{gene[0]}"]) map_id.each do |entry| entry.gsub!('map', 'bsu') entry.gsub!('path', 'map') if map_gene.key?(entry) map_gene[entry] << gene[0] else map_gene[entry] = [gene[0]] end map_gene.each do |k, v| colors = [] v.each do |g| colors << "##{ma.orf2rgb[g]}" end p serv.color_pathway_by_genes("bsu:#{k}", v, colors) end serv = Bio::KEGG::API.new regulated_genes.each do |gene| map_id = serv.get_pathways_by_genes(["bsu:#{gene[0]}"]) map_id.each do |entry| entry.gsub!('map', 'bsu') entry.gsub!('path', 'map') if map_gene.key?(entry) map_gene[entry] << gene[0] else map_gene[entry] = [gene[0]] end map_gene.each do |k, v| colors = [] v.each do |g| colors << "##{ma.orf2rgb[g]}" end p serv.color_pathway_by_genes("bsu:#{k}", v, colors) end

Acknowledgement  BioRuby developers  Naohisa Goto, Mitsuteru Nakao, Yoshinori Okuji, Shuichi Kawashima, Masumi Itoh and many more.  KEGG and KEGG API curators / developers  Bioinformatics center, Kyoto University, Japan  Yoko Sato, Shuichi Kawashima, Minoru Kanehisa  Open Bio* community  BioRuby developers  Naohisa Goto, Mitsuteru Nakao, Yoshinori Okuji, Shuichi Kawashima, Masumi Itoh and many more.  KEGG and KEGG API curators / developers  Bioinformatics center, Kyoto University, Japan  Yoko Sato, Shuichi Kawashima, Minoru Kanehisa  Open Bio* community

BioRuby is out! 25 Jun 2003