Download presentation
Presentation is loading. Please wait.
Published byJörg Fuhrmann Modified over 6 years ago
1
Apollo Progress Report Nomi Harris and Mark Gibson Berkeley Drosophila Genome Project
GMOD Meeting, SRI May 16-17, 2005
2
[MARK: please spend a minute introducing Apollo.]
We’re assuming all of you in the audience are familiar with Apollo, GMOD’s annotation curation tool. Gbrowse: view annotations (as GIF in web page) Apollo: view, edit, create annotations based on computational results; lets you modify transcript structures and edit auxiliary information associated with annotations. Apollo is a Java application. Easy to install on your computer.
3
Improvements Since Release 1.4.6 (July 2004)
Transactions Chado JDBC adapter ChadoXML adapter Selected minor improvements When talking through this slide, mention that Mark will discuss the first two points and then Nomi will talk about the ChadoXML adapter, recent minor improvements and future plans
4
Chado Roundtrip Options
GAME Adapter GAME XML G2C C2G Chadoxml Adapter Chado XML XORT Apollo ChadoTrans Adapter ChadoTrans XML JDBC Adapter Chado DB
5
Improvements Since Release 1.4.6 (July 2004)
Transactions Chado JDBC adapter ChadoXML adapter Selected minor improvements
6
Transactions & Integrated DB
“Integrated” DB has non-Apollo data “Wipeout & insert” will have a hard time preserving non-Apollo data Transaction writeback does not affect non-Apollo data Deleted objects are missing from saved data file—need transactions to keep track of them Talking elaboration trans writeback does not affect non-Apollo data because only data changed inside Apollo is saved This is a crucial point that I didn’t really arrive at til after the last gmod meeting that I think really needs to be made - not sure where to place it in the talk? Wipeout & insert is using Chado xml or game xml without transactions, wiping out the old gene in the database and inserting the contents of Chadoxml/gamexml. This is the current flybase paradigm. Presumably when the gene is deleted all non-Apollo data will be lost and not reinserted. A hard time is an understatement - I don’t see how wipeout & insert will be able to preserve non Apollo data at all. It would have to do a lot of hacking of sorts or something - maybe come up with the equivalent of transactions on db side - or perhaps have clear delineations in Chado between Apollo & non-Apollo which seems non-trivial. [Well, I managed to capture some non-Apollo fields from the ChadoXML just by saving them as properties. So it’s not necessarily impossible. --NH ] -- right but that’s a few things and often its stuff that could conceivably be a part of Apollo - its stuff related to Apollo data - when fly goes full blown integrated db (all the cambridge genetic data) it would be nuts to carry ALL of that through Apollo - at least I think it would be - MG It should be noted that transactions arent all the way there yet - merges & splits are done with wipeouts for now - same with type changes - this needs to be amended at some point - but the principal is there Maybe this slide should go early on? Might just make delete point a talking point - not sure if it needs a bullet [right] [I moved this to before you start talking about transactions and writeback, so the idea is, first, why do we need transactions, and then, what do transactions look like and how do they work? --NH] -- ok ill buy that - sort of a transaction motivator I guess - rather than a post analysis
7
Transactions & Writeback
Chado Transaction Transformer Transaction XML (interim save) Edit Chado Transaction Transaction Manager (list) Apollo User Chado Transaction XML Chado SQL Note - I'm using “Transaction XML” not game transaction xml - its not game - and it can be used by both game & Chado xml - so I think Transaction XML works Undo XORT JDBC Coalesce Transactions Chado DB
8
Apollo Transactions Capture fine grain edits
User Apollo Transaction (edit) Transaction Manager (trans list) Capture fine grain edits Transaction object contains: Operation (add, delete, update) Feature (gene, transcript, exon) Subpart (none, comment, name…) One “action” can result in many transactions The one action -> many trans is the motivation for compound transactions
9
Coalesce Transactions
Manager (trans list) Coalesce Transactions Filter out redundant edits Done at commit time Its pretty good but not perfect yet… some redundant edits still get through - not the end of the world just means a few extra commits to the database Note: Have to coalesce at commit time not during editing session for undo to work properly Talking point (taken out as bullet as its too obscure without undo/compound before): Currently coalescing flattens out compound transactions. This is problematic if you want split merges (TAIR). This will be amended. (I think ill add this as a bullet) Well I moved the undo slide that brought up compound transactions so now this seems funny… hmmm… actually I touch on it in the previous slide so not too bad - don’t really go into it here
10
Interim save for transactions
XML Adapter Transaction XML (interim save) Transaction Manager (trans list) Saves transactions in separate xml file GAME & Chado XML adapters save transactions Depends on accompanying data file [Ok maybe I should be clearer - but I can do that through talking - what I mean is tran xml saved with game & Chado xml - the Chado jdbc adapter does not save the transactions as xml - it uses Apollo java transactions (which may have been loaded from tran xml) to write the database ] I think calling transaction xml game xml is a misnomer and misleading since they can be used by other adapters besides game Saved in separate .tnxml file. Dependent on data file for adding features - I think actually this is the way to go - ya hafta save both data & transactions - if you had a new gene in both game & transaction xml it would be redundant - & redundancy has the danger of getting out of synch. So im now thinking self contained transactions are not the way to go. Maybe take out dependent bullet - minor point
11
Chado Transactions Transform Apollo Transaction to Chado Transaction
Manager (trans list) Chado Transaction Transformer Chado Transaction (java) Transform Apollo Transaction to Chado Transaction One to many One way [I think we can leave it - it’s a quickie - ~ 1 minute - and it drives home the point that Apollo transactions need to get transformed into Chado transactions - that they are fundamentally different creatures] This is all still in java/Apollo. Apollo transaction and Chado transaction are both java objects here. Apollo transaction corresponds to an edit in Apollo. A Chado transaction corresponds to operating on a row in a Chado table One Apollo transaction can result in many Chado transactions. The main example is shared exon (there could be a slide on shared exons that illustrates the Apollo Chado impedance mismatch) One way! Since it one to many its actually non trivial to go from Chado transactions to Apollo transactions (theres no way to group Chado transactions) and there currently is no need for this.
12
Exon Range Change Example
Delete Old Feature Relationship Insert Exon Feature Exon Range Change Apollo transaction Chado Transaction Transformer User Edit Exon Range Change Insert FeatureLoc Insert New Feature Relationship Transcript Range Change Apollo transaction Chado Transaction Transformer Update FeatureLoc [this could also be just a quickie - I think its interesting to see how one Apollo trans ends up as a bunch of Chado trans - and to touch on the shared exon headache] This is for going from a shared exon range to a non-shared exon range. Theres a bunch of lookup queries as well not included here. Also an exon range change can very possible trigger a transcript & gene range change - not shown here(there would be Apollo transactions for that as well). This makes clear why its hard to go from Chado transactions to Apollo transactions - how can you tell which Chado transactions get lumped into an Apollo transaction? Theres no way to add anything to Chado transactions to indicate as much (no compound transactions). Intorducing compound transaction concept as well with 1 edit to 2 Apollo trans 1 user edit to many Apollo trans. (compound) 1 Apollo transactions to many Chado trans. Chado exons are shared Chado lookups not included
13
Chado Roundtrip Options
GAME Adapter GAME XML G2C C2G Chadoxml Adapter Chado XML XORT Apollo ChadoTrans Adapter ChadoTrans XML JDBC Adapter Chado DB
14
Chado Transaction Writeback
XML Chado Transaction (java) Chado Tran XML Writer XORT Chado DB Chado transaction object gets written out as Chado XML XORT commits XML to Chado DB This route is not being used or tested (an abandoned rice route). We were actually implementing this strategy for rice/gmod, but Guanming recoginized it was pretty easy given the Chado transaction to just write out sql and do jdbc writeback. Nonetheless for debugging purposes the Chado transaction xml is really handy. [This is definitely a quickie slide - ~30 seconds - I just want to point out that this route is available - I guess the other agenda is I want flybase in particular to be aware of this route as I think this is the way they will want to go once they have an integrated db - as it deals much better with an integrated db (its transactional) and it still goes through xort which is what they trust - so I guess you could say this is a pitch for fly - also its yet another option for a new mod to use - if they want transactional commits but are more comfortable with xml/xort than jdbc(like fly) - MG]
15
Improvements Since Release 1.4.6 (July 2004)
Transactions Chado JDBC adapter ChadoXML adapter Selected minor improvements
16
Chado Roundtrip Options
GAME Adapter GAME XML G2C C2G Chadoxml Adapter Chado XML XORT Apollo ChadoTrans Adapter ChadoTrans XML JDBC Adapter Chado DB
17
JDBC Writeback Chado Transaction (java) JDBC Trans Writer Chado SQL JDBC Chado DB JDBCTransactionWriter creates SQL from Chado Transaction JDBC commits SQL to Chado DB Rice Chado project will use this (in testing phase) In fact Chado xml transactions and Chado sql and java Chado transactions are all just different manifestations of an operation on a Chado row. Being tested & debugged for rice Chado project. I've fixed a lot of bugs here - its in pretty good shape.
18
Name Adapter Apollo User Name Edit Name Adapter Name Transactions Transaction Manager (trans list) Name adapters capture MOD specific behavior for IDs and names Many edits affect names & IDs (merge, split, …) Rice & Fly have their own name adapters Inherit from new generic GMOD name adapter An example is name adapter gets a gene name edit and then changes the names of transcripts, proteins and exons. Configurableness is a talking point not a bullet I guess(?) By configurable I mean should be able to configure from configuration file and not have to write any java presently its all in java Unclear if type-less should be a bullet or just a talking point - unlike fly there's no type in a rice id - all prefixed with RICE, so switching types doesn’t incur a bunch of id & name changes - its simpler. For undo purposes name transactions are lumped into one compound transaction(covered in next slide)
19
Undo Facilitated by Transactions
Apollo User Undo Transaction Manager (trans list) Facilitated by Transactions Compound Transactions for compound events (name change, split…) Partial implementation (only in Annotation Info Editor) Name adapter slide is an example of an edit that causes a compound transaction - gene name edit causes transcript exon & protein names to be changed. Im thinking maybe this slide should go after all the transaction writeback stuff as its sort of a side note addendum, not the main gist - ok I moved - previously it went after Apollo transactions (it was slide 5) - now im not sure - it’s a nice intro to compound transactions - separate compound transaction slide? But undo is really the motivation for compound trans Transactions use the Command design pattern - which yields undo
20
Annotation Info Editor
21
JDBC Reader Improvements
More configurable (XML config file) Reads out of range leaf features Queries optimized New command-line arguments for reading and writing data [Mark--I can cover that last point, as it’s not really JDBC-specific. - right I was just going to briefly touch on the fact that its really handy to use the cmd line to go from Chado to game/Chadoxml & vice versa - that ya don’t hafta bring up the gui I moved this slide here, with the idea that you can first talk about improvements in the reader, and then move on to talk about the writer, but it’s also ok if you want this to be your last slide (then you can close by saying, “…And now Nomi will tell you about the Chado XML adapter.”). –NH] The goal with the jdbc reader is to become fully configurable, so you don’t have to write java to configure (similar to name adap goal) Out of range hsps - fly doesn’t support this rice does - chris has asked fly to do so and they intend to but never get around to it, this leads to misleading gui where chopped off feats that are part of in range feat set aren't there. Main optimization was getting around an outer join in search hits query - postgres doesn’t like em - thanks scott. Cmd line is crucial for using jdbc read & write with game (or Chado xml I suppose). [Why?] -- well crucial is an exaggeration but this is something adrian insisted on and I can see why otherwise he would have to go through the gui to make up all his game files - and subsequently to save them back - yuck! - that’s what im getting at - I actually don’t really go into this
22
Improvements Since Release 1.4.6 (July 2004)
Transactions Chado JDBC adapter ChadoXML adapter Selected minor improvements
23
Chado Roundtrip Options
GAME Adapter GAME XML G2C C2G Chado XML Adapter Chado XML XORT Apollo ChadoTrans Adapter ChadoTrans XML JDBC Adapter Chado DB
24
ChadoXML Adapter Read/write ChadoXML without G2C/C2G converters
GAME Adapter GAME XML G2C C2G Chado XML Adapter Apollo Chado XML Read/write ChadoXML without G2C/C2G converters Option to save annotations only Names exons using shared exon numbers Collect non-redundant set of exons Number from lowest to highest start I don’t really get why the shared exon thing is supposed to be such a problem. It’s not a big deal generating the exon numbers, and it’s not a big deal saving the data (since they just save redundant copies of the shared exons anyway). Maybe it’s more of a problem for the Chado adapter than for the ChadoXML adapter? --NH Well that’s good to hear - perhaps this was an overhyped issue - I think if there was stuff(feat props) attached to exons we’d be in trouble as we would actually have to track exon # evolution, but that’s not the case. Also its easier with wipeout as ya just redo them all. The idea of transactions is not to redo them all - but may have to anyways. Rice solves this by putting ranges in there names rather than numbers (which are trivial to derive dynamically) - MG
25
ChadoXML adapter: What it doesn’t do (yet)
Doesn’t yet handle macros (will soon) Doesn’t yet roundtrip all non-Apollo data (e.g. feature_cvterms) Don’t have appropriate datamodels inside Apollo Need to beef up some datamodels, e.g. for synonyms (author, etc.) Still somewhat fly-specific
26
Improvements Since Release 1.4.6 (July 2004)
Transactions Chado JDBC adapter ChadoXML adapter Selected minor improvements
27
Selected minor improvements
Better GAME XML schema description (game.rng) Can translate rng to xsd (less stringent) RELAX-NG can represent elements that occur in any order but only once (e.g. “start” and “end”) Mention that game.dtd is inaccurate and should be ignored—use game.rng instead <element name="annotation"> <interleave> <optional> <attribute name="problem"> <data type="boolean"/> </attribute> </optional> <attribute name="id">
28
Selected minor improvements
Faster saving of GAME and ChadoXML: Buffered saving much faster (seconds vs. minutes) Can save just annotations (and genomic residues) without results
29
Selected minor improvements
Synonyms can now be deleted (as well as added or changed) in annot info editor New command-line arguments Can specify input (or output) filename or source and format Guesses format if not specified Centralized UserName class and GUI Fixed Windows-only problem: mouse-over brought main window to front (Jon Slenk of TAIR) Leave out command-line args? Mark, you want to talk about those in your part? You can leave them here - im just touching on them briefly as it applies to jdbc - maybe ill take it out as a bullet? & just mention it as an aside? - MG
30
Apollo Future Plans
31
Coming Soon New floating panels for expression data, insertions, promoter elements, etc. Improve analysis adapter Load/layer raw computational analysis output (BLAST, BLAT, GENSCAN, etc.) Cleaner UI More documentation Reverse analysis: from subject to query Less fly-centric ChadoXML adapter Improve JDBC writeback & transactions Its unclear how much more jdbc can be sped up - its actually really fast with rice - and it used to be reasonable with fly's old servers - the new fly servers are really really slow - but it doesn’t really matter as no one is using jdbc with fly that I know of - and I don’t feel like badgering don over something that’s not even used - but rice jdbc will definitely be used and its zippy! - whenever I hit a slowdown bottleneck I go into optimization mode - like with the recent feat locs
32
Coming Not As Soon Full Undo? Improve synteny? Protein editor?
Full Types editor, incorporating Sequence Ontology (SO) terms? Apollo webstart?
33
Apollo Webstart Why? How? Who? Need better query tool?
Launch blank Apollo Launch on particular region (Mozilla only) Who? Rumors of success Help from community?
34
The End is Near November 30, 2005: Apollo team runs out of money…unless grant application gets funded Need justification for grant: Apollo community describes how important Apollo is to their work
35
Examples of Apollo Use Arabidopsis Information Resource (TAIR)
Manual curation of computational results Wrote new data adapter (relational db) Institute for Systems Biology, Seattle (Alistair Rust) Visualizing putative transcription factor binding site predictions for various algorithms University of British Columbia Bioinformatics Centre (UBiC) Pegasys computational pipeline-> GAME -> Apollo -> manual curation TIGR Helped write initial Chado JDBC adapter Plan to use Apollo in production annotation pipeline ParameciumDB, Genoscope Using GMOD software (Chado db, etc.) Planning to use Apollo for community annotation curation
36
The Apollo Team FlyBase Berkeley: CSH: Past contributors:
Suzanna Lewis, Nomi Harris, Mark Gibson, Sima Misra CSH: Guanming Wu, Scott Cain Past contributors: Sanger Institute--Steve Searle, Michele Clamp, Vivek Iyer HHMI--John Day-Richter TIGR--Jonathan Crabtree FlyBase--curators Scott is helping out getting Chado rice working with Apollo rice (triggers, ids, loading rice…) - he's been a huge help - he also has helped optimizing queries - don’t know where he fits in here - Guanming is of the past but of the present since the last gmod meeting
37
Code available at SourceForge: http://sourceforge.net/projects/gmod
We’d like more feedback from the Apollo community about what they’d like to see in Apollo—talk to us or send us . Installer available at Code available at SourceForge:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.