Porting CHADO and GMOD Tools to Oracle and Integration with dictyBase Eric Just dictyBasehttp://dictybase.org Center for Genetic Medicine Northwestern.

Slides:



Advertisements
Similar presentations
9 Creating and Managing Tables. Objectives After completing this lesson, you should be able to do the following: Describe the main database objects Create.
Advertisements

May 16, 2005Scott Cain, CSHL. May 16, 2005Scott Cain, CSHL gmod update Gmod RC2 last week New for 0.003: –Generic triggers for Apollo –Greatly enhanced.
Management Information Systems, Sixth Edition
Guide to Oracle10G1 Introduction To Forms Builder Chapter 5.
Mgt 20600: IT Management & Applications Databases Tuesday April 4, 2006.
Week 5 – Chap. 5 Data Transfer DBAs often must transfer data to and from text files, Excel spreadsheets, Access, Oracle or other SQL Server databases This.
Confidential ODBC May 7, Features What is ODBC? Why Create an ODBC Driver for Rochade? How do we Expose Rochade as Relational Transformation.
Databases Dan Otero Alex Loddengaard
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
Oracle Developer Tools for Visual Studio.NET Curtis Rempe.
Computing for Bioinformatics Introduction to databases What is a database? Database system components Data types DBMS architectures DBMS systems available.
Oracle for Software Developers. What is a relational database? Data is represented as a set of two- dimensional tables. (rows and columns) One or more.
Entity Framework Code First End to End
Database Design for DNN Developers Sebastian Leupold.
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. ACCESS 2007 M I C R O S O F T ® THE PROFESSIONAL APPROACH S E R I E S Lesson 4 – Creating New.
Modware: Its latest development using Moose and Bio::Chado::Schema Siddhartha Basu dictyBase Center for Genetic Medicine Northwestern.
Application Object Library (AOL)
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
Database Technical Session By: Prof. Adarsh Patel.
GMOD Chado: to a Model-View-Controller (MVC) architecture? Valentin GUIGNON ID, DAP, BIOS CIRAD Montpellier.
Introduction to SEQUEL. What is SEQUEL? Acronym for Structural English Query Language Acronym for Structural English Query Language Standard language.
MET280: Computing for Bioinformatics Introduction to databases What is a database? Not a spreadsheet. Data types and uses DBMS (DataBase Management System)
1 Structured Query Language (SQL). 2 Contents SQL – I SQL – II SQL – III SQL – IV.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
14895 East 14 th Street, Suite 300  San Leandro, CA phone /  fax  Systems Integration Secrets.
Data Driven Designs 99% of enterprise applications operate on database data or at least interface databases. Most common DBMS are Microsoft SQL Server,
Information Building and Retrieval Using MySQL Track 3 : Basic Course in Database.
SQL Fundamentals  SQL: Structured Query Language is a simple and powerful language used to create, access, and manipulate data and structure in the database.
SQL Jan 20,2014. DBMS Stores data as records, tables etc. Accepts data and stores that data for later use Uses query languages for searching, sorting,
3 Copyright © 2004, Oracle. All rights reserved. Working in the Forms Developer Environment.
DATABASE CONNECTIVITY TO MYSQL. Introduction =>A real life application needs to manipulate data stored in a Database. =>A database is a collection of.
9 Copyright © Oracle Corporation, All rights reserved. Creating and Managing Tables.
GLOBEX INFOTEK Copyright © 2013 Dr. Emelda Ntinglet-DavisSYSTEMS ANALYSIS AND DESIGN METHODSINTRODUCTORY SESSION EFFECTIVE DATABASE DESIGN for BEGINNERS.
Benjamin Post Cole Kelleher.  Availability  Data must maintain a specified level of availability to the users  Performance  Database requests must.
SQL Overview Structured Query Language. Description  When Codd first described the theory of relational databases, he asserted that there should be a.
CMap Version 0.16 Ben Faga. CMap CMap Version 0.16 Bug fixes and code optimizations More intuitive menu system Asynchronous loading of comparative map.
What's new with GMOD Scott Cain GMOD Coordinator
Task #1 Create a relational database on computers in computer classroom 308, using MySQL server and any client. Create the same database, using MS Access.
1 CS 430 Database Theory Winter 2005 Lecture 10: Introduction to SQL.
CP476 Internet Computing Perl CGI and MySql 1 Relational Databases –A database is a collection of data organized to allow relatively easy access for retrievals,
Chapter 18 Object Database Management Systems. Outline Motivation for object database management Object-oriented principles Architectures for object database.
Running GBrowse and DAS/1 on GUS Haiming Wang Jessica Kissinger Laboratory, Genetics C210 University of Georgia GUS Workshop August 8, 2005.
Sequence Curation: Adding an Additional Track to the Genome Browser dictyBase is populated with many different sources of data: gene predictions, Genbank.
Chapter 3: Relational Databases
Metasolv-OCDM Connector Metasolv OCDM. What is the MSS Adapter for Oracle Communications Data Model? The Oracle Communications Metasolv and Solution Adapter.
Presentation on Database management Submitted To: Prof: Rutvi Sarang Submitted By: Dharmishtha A. Baria Roll:No:1(sem-3)
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Sravanthi Lakkimsety Mar 14,2016.
 CONACT UC:  Magnific training   
 What is DB Testing ?  Testing at the Data Access Layer  Need for Testing DB Objects  Common Problems that affect the Application  Should Testers.
8 Copyright © 2005, Oracle. All rights reserved. Managing Schema Objects.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Data Resource Management Data Concepts Database Management Types of Databases Chapter 5 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies,
SQL Creating and Managing Tables
SQL Creating and Managing Tables
The Celera Genome Browser: A Tool for Visualizing and Annotating the Human Genome
GBrowse-related work at ApiDB
SQL Creating and Managing Tables
Ch 3 Synonym.
Chapter 4 Indexes.
CH 4 Indexes.
Structured Query Language
CH 4 Indexes.
Ch 3 Synonym.
Explore Evolution: Instrument for Analysis
Chapter 3 Synonym.
Updating Databases With Open SQL
Updating Databases With Open SQL
Presentation transcript:

Porting CHADO and GMOD Tools to Oracle and Integration with dictyBase Eric Just dictyBasehttp://dictybase.org Center for Genetic Medicine Northwestern University

WHY? dictyBase based on SGD Increase flexibility in feature storage Want to use CHADO for feature data, but ‘dicty’ SGD schema for the rest ‘dicty’ SGD (Oracle) needs to link to CHADO Eric Just - dictyBase – Northwestern University

Schema porting SQL Fairy did most of this, but Had to tweak Oracle Producer Had to tweak Oracle Producer Object name limited to 30 characters, systematically truncate names Object name limited to 30 characters, systematically truncate names Unique/primary keys on CLOBs (text) not allowed, changed to varchar2(4000) Unique/primary keys on CLOBs (text) not allowed, changed to varchar2(4000) ‘SYNONYM’ reserved name in Oracle, changed name to ‘SYNONYM _’ ‘SYNONYM’ reserved name in Oracle, changed name to ‘SYNONYM _’ Eric Just - dictyBase – Northwestern University

Class::DBI Class::DBI provides nice ‘table level’ abstraction CRUD, follow references WITHOUT WRITING SQL Excellent tool for portability GMOD ships with Class::DBI configured for CHADO Had to fix/customize Oracle Driver Eric Just - dictyBase – Northwestern University

AutoDBI Package which loads Class::DBI classes for each table Keep class name Chado::Synonym but call set_up_table( ‘synonym_’ ) Made ‘residues’ a ‘lazy’ column of Chado::Feature No other Significant porting needed Eric Just - dictyBase – Northwestern University

Data Migration GFF3 CHADO Export chromosome sequences and locations in GFF3 Load GFF3 into CHADO schema Update references to features with new tables and id’s ‘dicty’ SGD Eric Just - dictyBase – Northwestern University

‘rows’ method does not exist in Oracle DBI Driver GBrowse porting my $rows_returned $sth->execute or Bio::Root::Root->throw(); if ( $rows_returned == 0) {…} if ($sth->rows() == 0) {…} Oracle fetchrow_hashref() is case sensitive $sth->fetchrow_hashref() $sth->fetchrow_hashref("NAME_lc") Eric Just - dictyBase – Northwestern University

GBrowse porting - Queries select f.feature_id, f.name, fl.fmin,fl.fmax from feature f join featureloc fl using (feature_id) where f.feature_id = and fl.rank=0; select f.feature_id, f.name, fl.fmin,fl.fmax from feature f join featureloc fl on f.feature_id = fl.feature_id where f.feature_id = and fl.rank=0; Oracle does not like anything in a ‘using’ clause to also be in the ‘where’ clause ‘substring’ becomes ‘substr’ Any SQL containing synonym table must be modified Any procedural SQL must be reproduced, in some cases this can be avoided Eric Just - dictyBase – Northwestern University

Tuning Added is_deleted flag to feature table Added some audit columns Added audit table and triggers Created Indexes Heuristically Added hints to some difficult queries Eric Just - dictyBase – Northwestern University

‘Dbtable’ database abstraction layer Various middleware and presentation objects dictyBase Object Model dictyBase Presentation Layer Integrating into dictyBase I ‘dicty’ SGD Eric Just - dictyBase – Northwestern University

‘Dbtable’ layer dictyBase Object Model dictyBase Presentation Layer Class::DBI layer Integrating into dictyBase II ‘dicty’ SGD CHADO Various middleware and presentation objects Eric Just - dictyBase – Northwestern University

dictyBase Objects Retrieve, insert, update, delete Interface ignorant of schema No presentation in data classes Easy to use interfaces Tuned with lazy evaluation most accessors 75 – 80% unit test coverage Eric Just - dictyBase – Northwestern University

Use BioPerl Use Bio::Seq to represent sequences Use Bio::SeqFeatures to represent transcript and alignment locations Harness the power of BioPerl for sequence tasks, file generation NOTE: BioPerl only used for sequence and location Eric Just - dictyBase – Northwestern University

Class Diagram Feature AlignedmRNAContigChromosome getOverlappingFeatures() getOverlappingAlignments() Bio::SeqFeature::Gene::TranscriptBio::SeqFeature::Generic Bio::Seq Eric Just - dictyBase – Northwestern University

Object use case: Add an Exon, dbxref, and Description #!perl use dicty::Feature; my $transcript = new dicty::Feature( -feature_no => ); $transcript->description( ‘Gene model derived from AU12345' ); $transcript->add_external_id( -source => ‘GenBank Accession Number', -id => 'AU12345' ); $bioperl = $transcript->bioperl(); [$bioperl->exons()]->[2]->start( ); my $exon = Bio::SeqFeature::Gene::Exon->new( -start => , -end => , -strand => -1 ); $exon->is_coding(1); $bioperl->add_exon($exon); $transcript->update(); Eric Just - dictyBase – Northwestern University

GenBank file GenBank file Using Apollo Request segment through SOAP message over HTTP Object layer generates GenBank File Modify, in Apollo send changed gene models back via SOAP Adaptor changes gene models and updates the database Chado Object layer Send GenBank File via SOAP message GenBank file GenBank file Eric Just - dictyBase – Northwestern University

New Curation Tools Gene and Feature curation had to be rewritten ‘Gene centric’ curation Added more evidence qualifiers Presentation classes that manipulate Object Layer Eric Just - dictyBase – Northwestern University

Where Are We Going Utilize the flexibility – New Feature Types, feature relations, and SO. Contribute back to GMOD Gradually port different areas into CHADO Provide feedback and testing ground for database independence Eric Just - dictyBase – Northwestern University

Acknowlegments dictyBasePIs Rex Chisholm, PhD Rex Chisholm, PhD Warren Kibbe, PhD Warren Kibbe, PhDProgrammer Sohel Merchant Sohel MerchantCurators Petra Fey Petra Fey Pascale Gaudet, PhD Pascale Gaudet, PhD Karen Pilcher Karen Pilcher Bioinformatics Core at Northwestern Other Groups Funding NIH (NIGMS and NHGRI) NIH (NIGMS and NHGRI)SGDGMOD CHADO CHADO GBrowse GBrowse Apollo ApolloBioPerl Eric Just - dictyBase – Northwestern University