Presentation is loading. Please wait.

Presentation is loading. Please wait.

Nexml A future data exchange standard for phylogenetics

Similar presentations


Presentation on theme: "Nexml A future data exchange standard for phylogenetics"— Presentation transcript:

1 Nexml A future data exchange standard for phylogenetics
Rutger Vos University of British Columbia

2 Introduction (1/7) The problem
EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach Example Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources Introduction (1/7) The problem Increased automation in evolutionary informatics is hampered by poorly defined “standards”

3 Introduction (2/7) EvoInfo interests
The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach Example Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources Introduction (2/7) EvoInfo interests Semantics: CDAO Syntax: Nexml Transport: PhyloWS Addressing interoperability problems by coding our way out of it

4 Introduction (3/7) This subproject’s mission
The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach Example Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources Introduction (3/7) This subproject’s mission To create a file format like nexus*, but: Fix (some) problems with nexus Give access to data at higher level Be extensible Expose data to xml goodies *Maddison, Swofford and Maddison, NEXUS: An Extensible File Format for Systematic Information. Syst. Biol. 46(4):

5 Introduction (4/7) Nexus problems
The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach Example Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources Introduction (4/7) Nexus problems Hard/impossible to validate No explicit versions Nothing ever deprecated No public extensions Leads to hacks such as ‘mixed’ data, ‘hot comments’ Phylogenetics post-’80s in private blocks

6 Introduction (5/7) Parsing plain text versus parsing XML
The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach Example Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources Introduction (5/7) Parsing plain text versus parsing XML Processing nexus data involves lexing + parsing + processing XML allows choosing a parser library, data can be processed as a structure that hides tokenization issues

7 Introduction (6/7) Extensibility
The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach Example Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources Introduction (6/7) Extensibility ‘Extensible’ file format should provide the ability to: define new data types that implement described ‘interfaces’ attach typed data structures to core types attach custom XML

8 Introduction (7/7) XML goodies
The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach Example Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources Introduction (7/7) XML goodies Large stack of off-the-shelf tools: XML parser libraries Web service toolkits Native XML databases Editors / IDEs Serialization / data binding tools

9 Design (1/5) Design principles
Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach Example Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources Design (1/5) Design principles Re-use of prior art Follow design patterns Referencing Verbose and compact representations

10 Design (2/5) Re-use of prior art
Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach Example Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources Design (2/5) Re-use of prior art Generic key/value attachments following apple’s plist semantics: <dict> <key>prior</key> <float>0.78</float> </dict> Trees and networks following graphml General file structure following nexus concepts, i.e. blocks that reference each other

11 Design (3/5) XML design patterns
Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach Example Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources Design (3/5) XML design patterns “Declare before use” “Metadata first” “Venetian blinds” Abstract inheritance through extension, concrete inheritance through restriction

12 Design (4/5) Inheritance
Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach Example Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources Design (4/5) Inheritance “Base”, optional base/lang/href attributes extends “Annotated”, optional dict elements extends “Labelled”, optional label attribute extends “IDTagged”, required id attribute extends “AbstractElement”, in root schema restricts “ConcreteElement”, in instance document

13 Design (5/5) Referencing
Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach Example Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources Design (5/5) Referencing Elements sometimes refer to other elements, much like in nexus In nexml, elements refer to the id of other elements by the name of the referenced element: <otu id="t1"/> <!-- i.e. OTU, referenced later as: --> <node id="n1" otu="t1"/>

14 Implementation (1/6) Approach
Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach Example Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources Implementation (1/6) Approach Schema design Community feedback through wiki, , telecon, projects (evoinfo, ppod, MIAPA) etc. Processors (perl, java, python, c++, VB) development in parallel Experiments with xml tools (ws, db, data binding tools)

15 Implementation (2/6) root element
Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach Example Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources Implementation (2/6) root element version="1.0" generator="mesquite" Versioned namespace: xmlns:nex="

16 Implementation (3/6) inheritance tree for elements
Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach Example Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources Implementation (3/6) inheritance tree for elements

17 Implementation (4/6) anatomy of a “block”
Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach Example Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources Implementation (4/6) anatomy of a “block” <characters id="c1" xsi:type="nex:DnaSeqs" otus="t1"> </characters> <dict> <key>desc</key> <string>description…</string> </dict> Contents…

18 Implementation (5/6) Character Classes
Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach Example Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources Implementation (5/6) Character Classes Granularity Sequence Cells DNA DnaSeqs DnaCells RNA RnaSeqs RnaCells Protein ProteinSeqs ProteinCells Standard StandardSeqs StandardCells Continuous ContinuousSeqs ContinuousCells Restriction RestrictionSeqs RestrictionCells Data type

19 Implementation (6/6) Tree Classes
Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach Example Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources Implementation (6/6) Tree Classes Branch type Float Int Network FloatNetwork IntNetwork Tree FloatTree IntTree Topology

20 Current status (1/4) Schema blocks
Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach Example Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources Current status (1/4) Schema blocks Done: OTUs characters: dna, rna, nucleotide, protein, categorical, continuous, restriction (compact and verbose) trees: graphml trees and networks, various edge formats and rootings

21 Current status (2/4) Parsers and writers
Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach Example Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources Current status (2/4) Parsers and writers Nexml parsers and writers: mesquite, java, using xmlbeans Bio::Phylo, perl pyNexml, python DAMBE, Visual Basic stubs for c++ xmlbeans plans for ruby?

22 Current status (3/4) Experiments
Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach Example Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources Current status (3/4) Experiments Included schema in soap wsdl Indexed files in dbxml Created large files from tolweb, rbcl XInclude with tinyseq xml REST service described using nexml

23 Current status (4/4) To do
Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach Example Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources Current status (4/4) To do Cross-reference with glossary, ontology Substitution model descriptions Publish standard Follow up on earlier feedback (small fixes) Sets (in progress, using class identifiers) more restricted vocabulary attachments (Darwin core) Distances Splits

24 Resources Base URL Wiki SourceForge project
Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach Example Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources Resources Base URL Wiki SourceForge project

25 Acknowledgements Contributions: Jason Caravas, Mark Holder, Peter Midford, Jeet Sukumaran, Xuhua Xia Feedback: wg-evoinfo, pPOD, Wayne Maddison, David Maddison Additional funding, support: NESCent, GSoC


Download ppt "Nexml A future data exchange standard for phylogenetics"

Similar presentations


Ads by Google