The GMOD Project Lincoln Stein Cold Spring Harbor Laboratory
Test Subject: Michael Caudy oDrosophila neurobiologist oProneural differentiation onotch pathway oHLH transcriptional activators/repressors oachaete/scute complex oNo computer science training oTook my “bioinformatics for biologists” course
“Simple” Problem oDiscover the transcriptional factor binding site code controlling proneural differentiation.
Regular Expression Search oUsing achaete promoter as exemplar, search for combinations of known binding sites in particular architectures
Mike’s Got Lots of Data o90-11,000 TF binding site clusters o100s-1000s of genes omillions of interactions oWhich genes are involved in neural differentiation? oWhich have interactions with the pathway? oWhich have suggestive mutant phenotypes?
Mike Needs a Database oDatabase management system for proneural differentiation genes. oVisualization/exploration tools for relationship of genes to putative TF clusters. oLiterature citations oLink out to FlyBase, Genbank & other DBs. oAdd notes and other annotations.
Try to do it with Filemaker o“Cluster-centric” vs “gene-centric”? oData import from FlyBase? oStoring images? oMaintaining relationships between genes & clusters? oUpdates?
Mike Needs a MOD oModel Organism Database oRepository for reagents oStocks, vectors, clones oGenetic & physical maps oLarge-scale data sets oGenome oEST sets, microarray results, 2-cell hybrid interactions oLiterature oOntologies & Nomenclature oMeetings, announcements
Example MOD: WormBase
Looking for Sex
An Author Entry
Bibliography
Citation
Gene
Genome
Proteome
Comparative Genomics
Functional Genomics
Anatomy
How WormBase Works ACeDB Images, Movies Database access library Web server Perl scripts You MySQL Genomic Data
Can Mike reuse WormBase to manage his data? No!
Sorry Mike oWormBase website difficult to install oData model nematode-centric oData entry tools very process- specific oCustomization difficult oSoftware documentation uneven oStandard operating procedure documentation uneven
MOD Redux oSGD, MGD, FlyBase, TAIR, RGD… oThe same basic idea as WormBase oImplementation entirely different oWheel reinvented many times oLittle software sharing oThis madness must stop!
The GMOD Project oPortable, open source software to support model organism databases oMultiple MODs involved oWorm, fly, yeast, mouse, arabidopsis, rat, monocot, [fugu], [E. coli] oFunded by NIH as of June 2002 oProgrammers, coordinator, quarterly meetings
GMOD Home Page
The GMOD Pyramid Open Source DBMS & Middleware Modular Schema Modular Applications
A MOD Construction Set genome genetic maps liter- ature genomes Middleware Layer Database Layer Appplication Layer mapscitations genome browser genome editor map browser map editor citation browser citation editor Bioperl BioJava BioPython annotation pipeline
Chado – Modular Schema oCommon schema for use by FlyBase and WormBase oOntology Driven oSmall number of generic tables e.g. “feature” oControlled vocabulary names object types and relationships among them: o“achaete protein is a HLH activator” o“m8 protein inhibits achaete transcription” oEvidence-Savvy
GMOD Applications oApollo genome annotation editor oGbrowse generic genome browser oPubSearch literature curation editor oCMAP comparative map browser oIMD insertional mutagenesis database management system
Apollo – BDGP & Sanger Center
Apollo Data adapters oParser -> data models -> display oExisting data adapters oGAME XML oGFF oEnsembl CGI server oDAS oWrite your own data adapter! oExtend AbstractDataAdapter class oDisplay options defined in config file
Who is Using Apollo? oBDGP oReannotated Drosophila genome oBristol-Myers Squibb oLaunching Apollo from web browser via mime types oGNF oJDBC adapter layer over BioSQL oBiogen oView human genome alignment between public and Biogen internal database oConnected BLAT pipeline to Apollo oHGMP-RC Fugu Genomics group oDisplaying annotations on fugu scaffolds
PubSearch – TAIR & RatDB
PubSearch – Gene Association
IMD – Insertional Mutagenesis Db
CMap – Gramene
Cmap – Detailed View
GBrowse – WormBase
GBrowse – Zoomed in
GBrowse – Zoomed Way In
GBrowse – Zoomed Way Way In
GBrowse – Keyword Search
GBrowse – Third Party Annotations
Sequence dumps & other reports
Extensively Customizable oEnd-user oTurn tracks on and off, change order, change packing & labeling attributes (stored in cookie) oData provider oChange fonts, colors, text. oChange overview – genetic map, contigs, coverage, karyotype. oDefine new tracks using simple config file. oTinker with track appearance to hearts content.
Adding a New Track (a) Create a GFF file named “deletions.gff” Chr1 targeted deletion Deletion d101k2 Chr1 targeted deletion Deletion d680k2 Chr2 targeted deletion Deletion d007k2 (b) Run the load_gff.pl script > load_gff.pl –d example_database deletions.gff Loading features… Done. 3 features loaded. (c) Add a new track “stanza” to the gbrowse configuration file [Knockout] feature = deletion glyph = span fgcolor = red key = Knockouts link = citation = These are deletion knockouts produced by the example knockout consortium (
Extensively Extensible Apache Web Server gbrowse CGI script BioPerl library Bio::DB::GFF adaptor Chado adaptor MySQL/Postgres Plugins Bio::Graphics library Oracle Oracle adaptorFlat File adaptor Flat Files Glyphs
GBrowse on GenBank? Apache Web Server gbrowse CGI script BioPerl library Plugins Bio::Graphics library Glyphs GenBank Proxy Adaptor GenBank GBrowse on GenBank! Bio::DB::GFF adaptor MySQL
B. burgdorferi via GenBank proxy
Who is Using GBrowse? oGMOD Members oWormBase, FlyBase, RatDB oHGMP-RC Fugu genomics group oKEGG (multiple microorganisms) oIngenium AG (mouse) oBristoll-Myers Squibb (drosophila) oTexas A&M University (salmonella) oMcGill University (human chr7) oInstitute of Systems Biology (human)
Genome Knowledgebase (GK)
“Constellation View” (in dev) TCA Cycle Oxidative Decarboxylation Amino Acid Biosynthesis Ethanol Catabolism Glucose Metabolism RNA Splicing DNA Replication
“Constellation View” (in dev) TCA Cycle Oxidative Decarboxylation Amino Acid Biosynthesis Ethanol Catabolism Glucose Metabolism RNA Splicing DNA Replication
Can Mike use GMOD to manage his data? Almost
Mike’s very own flybase
Uploaded Annotations
Details
Essential Pieces in Progress oGeneric MOD web site oStrain & phenotype curation tools oPathway tools and browsers oTree (e.g. phylogenetic) tools & browsers oBiopipe – genome annotation pipeline
Find out more about GMOD oGo to oExamine software matrix oFind a project you’re interested in oContact project leader oOr contact Scott Cain: oOr mail
Credits CSHL Adrian Arva Shuly Avraham Scott Cain Ken Clark Allen Day Xiaokang Pan BDGP Nomi Harris Suzanna Lewis Chris Mungall John Richter ShengQiang Shu Colin Weil EBI Michele Clamp Stephen Searle Carnegie Institute Sue Rhee Danny Yoo Harvard David Emmert Stan Letovsky Cornell Medical School Michael Caudy