Download presentation
Presentation is loading. Please wait.
Published byJerome Cummings Modified over 7 years ago
1
Rutger Vos and Wayne Maddison University of British Columbia
Nexml Rutger Vos and Wayne Maddison University of British Columbia
2
Introduction (1/5) The idea
A file format like nexus, but: Fixes (some) problems with nexus Gives access to data at higher level Extensible Exposes data to xml goodies
3
Introduction (2/5) Nexus problems
Hard/impossible to validate No explicit versions Nothing ever deprecated No public extensions Leads to hacks such as ‘mixed’ data, ‘hot comments’ Phylogenetics post-’80s in private blocks
4
Introduction (3/5) Higher level data access
Processing nexus data involves lexing + parsing + processing XML allows choosing a parser library, data can be processed as a structure that hides tokenization issues
5
Introduction (4/5) Extensibility
‘Extensible’ file format should, more robustly than NEXUS, provide the ability to: define new data types that implement described ‘interfaces’ attach typed data structures to core types attach custom XML
6
Introduction (5/5) XML goodies
Large stack of off-the-shelf tools: XML parser libraries Webservices Native XML databases Editors/IDEs Serialization tools
7
Design (1/4) Design principles
Re-use of prior art Follow design patterns Referencing Verbose and compact representations
8
Design (2/4) Re-use of prior art
Generic key/value attachments following apple’s plist semantics: <dict> <key>prior</key> <float>0.78</float> </dict> Trees and networks following graphml General file structure following nexus concepts, i.e. blocks that reference each other
9
Design (3/4) XML design patterns
“Declare before use” “Metadata first” “Venetian blinds” Abstract inheritance through extension, concrete inheritance through restriction
10
Design (4/4) Referencing
Elements sometimes refer to other elements, much like in nexus In nexml, elements refer to the id of other elements by the name of the referenced element: <taxon id="t1"/> <!-- i.e. OTU, referenced later as: --> <node id="n1" taxon="t1"/>
11
Nexml (1/8) Approach Schema design
Community feedback through wiki, , telecon, meetings (evoinfo, ppod) etc. Processors (perl+mesquite+python) development in parallel Experiments with xml tools (ws, db, serialization)
12
Nexml (2/8) root element version="1.0" generator="mesquite"
Versioned namespace: xmlns:nex="
13
Nexml (3/8) inheritance tree for elements
“Base”, optional base/lang/href attributes extends “Annotated”, optional dict elements extends “Labelled”, optional label attribute extends “IDTagged”, required id attribute extends “AbstractElement”, in root schema restricts “ConcreteElement”, in instance document
14
Nexml (4/8) anatomy of a “block”
Name (e.g. "characters"), id attribute, xsi:type concrete subclass attribute (e.g. "nex:DnaSeq"), possible reference to other element: <characters id="c1" xsi:type="nex:DnaSeqs" taxa="t1"> </characters> Metadata attachment: <dict><key>desc</key><string>description…</string></dict> Contents…
15
Nexml (5/10) Character Classes
Granularity Sequence Cells DNA nex:DnaSeqs nex:DnaCells RNA nex:RnaSeqs nex:RnaCells Protein nex:ProteinSeqs nex:ProteinCells Standard nex:StandardSeqs nex:StandardCells Continuous nex:ContinuousSeqs nex:ContinuousCells Restriction nex:RestrictionSeqs nex:RestrictionCells Data type
16
Nexml (6/10) Tree Classes Float Int Network Tree Branch type
nex:FloatNetwork nex:IntNetwork Tree nex:FloatTree nex:IntTree Topology
17
Nexml (7/10) blocks, current status
Done: OTUs characters: dna, rna, nucleotide, protein, categorical, continuous, restriction (compact and verbose) trees: graphml trees and networks
18
Nexml (8/10) blocks, current status
To do: sets (in progress, using class identifiers) substitution model descriptions (KS progress) more restricted vocabulary attachments (Darwin core) distances splits cross-reference with glossary, ontology follow up on earlier feedback (small fixes)
19
Nexml (9/10) Experiments XML parsers: expat, libxml2, jdom
Processed schema using xmlbeans Included schema in soap wsdl Indexed files in dbxml Created large files from tolweb, rbcl XInclude with tinyseq xml REST service described using nexml
20
Nexml (10/10) Resources GSoC Base URL SVN Wiki SourceForge repository
Base URL SVN Wiki SourceForge repository
21
Acknowledgements Contributions: Jason Caravas, Mark Holder, Peter Midford, Jeet Sukumaran Feedback: wg-evoinfo, pPOD Additional funding, support: NESCent, GSoC
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.