Download presentation
Presentation is loading. Please wait.
Published byRandell Carson Modified over 9 years ago
1
NeXML A future data exchange standard for phylogenetics Rutger Vos University of British Columbia
2
Increased automation in evolutionary informatics is hampered by poorly defined “standards” Introduction (1/7) The problem Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources
3
Addressing interoperability problems by coding our way out of it Syntax: NeXML Semantics: CDAO Transport: PhyloWS Introduction (2/7) EvoInfo interests Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources
4
Introduction (3/7) This subproject’s mission To create a file format like nexus* *Maddison, Swofford and Maddison, 1997. NEXUS: An Extensible File Format for Systematic Information. Syst. Biol. 46(4):590-621 Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources Fix (some) problems with nexus Give access to data at higher level Be extensible Expose data to xml goodies, but:
5
#NEXUS BEGIN TAXA; DIMENSIONS NTAX=3; TAXLABELS taxon_1 taxon_2 taxon_3; END; BEGIN CHARACTERS; DIMENSIONS NCHAR=2; FORMAT DATATYPE=STANDARD GAP=- MISSING=? SYMBOLS="0 1 2"; MATRIX taxon_1 00 taxon_2 11 taxon_3 22; END; BEGIN TREES; TRANSLATE 1 taxon_1, 2 taxon_2, 3 taxon_3; TREE Tree1 = ((1:0.12,2:0.12):9.88,3:10.0); END;
6
Introduction (4/7) Nexus issues Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources https://www.nescent.org/wg_evoinfo/NEXUS_Problems No explicit versions Nothing ever deprecated No public extensions Leads to hacks such as ‘mixed’ data, ‘hot comments’ Phylogenetics post-’80s in private blocks Hard/impossible to validate
7
Introduction (5/7) Parsing plain text versus parsing XML Processing nexus data involves lexing + parsing + processing XML allows choosing a parser library, data can be processed as a structure that hides tokenization issues Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources
8
Introduction (6/7) Extensibility Extensible file format should provide the ability to: Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources Define new data types that implement described ‘interfaces’ Attach typed data structures to core types Attach custom XML
9
Introduction (7/7) XML goodies Large stack of off-the-shelf tools: Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources XML parser libraries Web service toolkits Native XML databases Editors / IDEs Serialization / data binding tools
10
Design (1/5) Design principles Re-use of prior art Follow design patterns Referencing Verbose and compact representations Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources
11
Design (2/5) Re-use of prior art Generic key/value attachments following apple’s plist semantics: prior 0.78 Trees and networks following graphml General file structure following nexus concepts, i.e. blocks that reference each other Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources Avoid tag soup! Will return to this later… Avoid tag soup! Will return to this later…
12
Design (3/5) XML design patterns “Declare before use” Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources “Metadata first” “Venetian blinds” Abstract inheritance through extension, concrete inheritance through restriction
13
Design (4/5) Inheritance IDTagged (required id attribute) Labelled (optional label attribute) Annotated (optional dict elements) Base (optional base/lang/href attributes) AbstractElement (in root schema) ConcreteElement (in instance document) extends restricts Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources
14
Design (5/5) Referencing Elements sometimes refer to other elements, much like in nexus In nexml, elements refer to the id of other elements by the name of the referenced element: Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources
15
Schema design Community feedback through wiki, email, telecon, projects (evoinfo, ppod, MIAPA) etc. Processors (perl, java, python, c++, VB, JavaScript) development in parallel Experiments with xml tools (ws, db, data binding tools) Implementation (1/6) Approach Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources
16
Implementation (2/6) Entity relationships Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources
17
Implementation (3/6) inheritance tree for elements Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources
18
Implementation (4/6) anatomy of a “block” <characters id="c1" xsi:type="nex:DnaSeqs" otus="t1"> desc description … Contents… Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources
19
Implementation (5/6) Character Classes RestrictionCellsRestrictionSeqs Restriction ContinuousCellsContinuousSeqs Continuous StandardCellsStandardSeqs Standard ProteinCellsProteinSeqs Protein RnaCellsRnaSeqs RNA DnaCellsDnaSeqs DNA CellsSequence Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources
20
Implementation (6/6) Tree Classes IntTreeFloatTree Tree IntNetworkFloatNetwork Network IntFloat Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources
21
Current status (1/4) Schema blocks Done: o OTUs o characters: dna, rna, nucleotide, protein, categorical, continuous, restriction (compact and verbose) o trees: graphml trees and networks, various edge formats and rootings Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources
22
Nexml parsers and writers: mesquite (java NeXML class libraries) Bio::Phylo (BioPerl compatible) pyNexml (python) DAMBE (Visual Basic) NCL (C++) JavaScript Current status (2/4) Parsers and writers Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources
23
Semantic annotation (CDAO) using SAWSDL Current status (3/4) Experiments Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources Scalability: Indexed files in dbxml Created large files from tolweb, rbcl XInclude with tinyseq xml REST Web services: ToL service validation service nexml2json, nexus2xml Schema inclusion in wsdl
24
Publish standard More restricted vocabulary attachments (e.g. Darwin core, CDAO- mediated terms) Substitution model descriptions Sets (in progress, using class identifiers) Distances Splits Current status (4/4) To do Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources
25
NeXML Base URL: http://www.nexml.org Wiki: /wiki Mailing list: /mail Issue tracker: /tracker SVN repository: /code EvoInfo: http://evoinfo.nescent.org CDAO: http://www.evolutionaryontology.org Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources
27
Acknowledgements Contributions: Jason Caravas, Mark Holder, Peter Midford, Jeet Sukumaran, Xuhua Xia Feedback: wg-evoinfo, pPOD, Wayne Maddison, David Maddison Additional funding, support: NESCent, GSoC
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.