Presentation is loading. Please wait.

Presentation is loading. Please wait.

NeXML A future data exchange standard for phylogenetics Rutger Vos University of British Columbia.

Similar presentations


Presentation on theme: "NeXML A future data exchange standard for phylogenetics Rutger Vos University of British Columbia."— Presentation transcript:

1 NeXML A future data exchange standard for phylogenetics Rutger Vos University of British Columbia

2 Increased automation in evolutionary informatics is hampered by poorly defined “standards” Introduction (1/7) The problem Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources

3 Addressing interoperability problems by coding our way out of it Syntax: NeXML Semantics: CDAO Transport: PhyloWS Introduction (2/7) EvoInfo interests Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources

4 Introduction (3/7) This subproject’s mission To create a file format like nexus* *Maddison, Swofford and Maddison, 1997. NEXUS: An Extensible File Format for Systematic Information. Syst. Biol. 46(4):590-621 Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources Fix (some) problems with nexus Give access to data at higher level Be extensible Expose data to xml goodies, but:

5 #NEXUS BEGIN TAXA; DIMENSIONS NTAX=3; TAXLABELS taxon_1 taxon_2 taxon_3; END; BEGIN CHARACTERS; DIMENSIONS NCHAR=2; FORMAT DATATYPE=STANDARD GAP=- MISSING=? SYMBOLS="0 1 2"; MATRIX taxon_1 00 taxon_2 11 taxon_3 22; END; BEGIN TREES; TRANSLATE 1 taxon_1, 2 taxon_2, 3 taxon_3; TREE Tree1 = ((1:0.12,2:0.12):9.88,3:10.0); END;

6 Introduction (4/7) Nexus issues Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources https://www.nescent.org/wg_evoinfo/NEXUS_Problems No explicit versions Nothing ever deprecated No public extensions Leads to hacks such as ‘mixed’ data, ‘hot comments’ Phylogenetics post-’80s in private blocks Hard/impossible to validate

7 Introduction (5/7) Parsing plain text versus parsing XML Processing nexus data involves lexing + parsing + processing XML allows choosing a parser library, data can be processed as a structure that hides tokenization issues Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources

8 Introduction (6/7) Extensibility Extensible file format should provide the ability to: Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources Define new data types that implement described ‘interfaces’ Attach typed data structures to core types Attach custom XML

9 Introduction (7/7) XML goodies Large stack of off-the-shelf tools: Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources XML parser libraries Web service toolkits Native XML databases Editors / IDEs Serialization / data binding tools

10 Design (1/5) Design principles Re-use of prior art Follow design patterns Referencing Verbose and compact representations Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources

11 Design (2/5) Re-use of prior art Generic key/value attachments following apple’s plist semantics: prior 0.78 Trees and networks following graphml General file structure following nexus concepts, i.e. blocks that reference each other Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources Avoid tag soup! Will return to this later… Avoid tag soup! Will return to this later…

12 Design (3/5) XML design patterns “Declare before use” Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources “Metadata first” “Venetian blinds” Abstract inheritance through extension, concrete inheritance through restriction

13 Design (4/5) Inheritance IDTagged (required id attribute) Labelled (optional label attribute) Annotated (optional dict elements) Base (optional base/lang/href attributes) AbstractElement (in root schema) ConcreteElement (in instance document) extends restricts Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources

14 Design (5/5) Referencing Elements sometimes refer to other elements, much like in nexus In nexml, elements refer to the id of other elements by the name of the referenced element: Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources

15 Schema design Community feedback through wiki, email, telecon, projects (evoinfo, ppod, MIAPA) etc. Processors (perl, java, python, c++, VB, JavaScript) development in parallel Experiments with xml tools (ws, db, data binding tools) Implementation (1/6) Approach Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources

16 Implementation (2/6) Entity relationships Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources

17 Implementation (3/6) inheritance tree for elements Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources

18 Implementation (4/6) anatomy of a “block” <characters id="c1" xsi:type="nex:DnaSeqs" otus="t1"> desc description … Contents… Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources

19 Implementation (5/6) Character Classes RestrictionCellsRestrictionSeqs Restriction ContinuousCellsContinuousSeqs Continuous StandardCellsStandardSeqs Standard ProteinCellsProteinSeqs Protein RnaCellsRnaSeqs RNA DnaCellsDnaSeqs DNA CellsSequence Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources

20 Implementation (6/6) Tree Classes IntTreeFloatTree Tree IntNetworkFloatNetwork Network IntFloat Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources

21 Current status (1/4) Schema blocks Done: o OTUs o characters: dna, rna, nucleotide, protein, categorical, continuous, restriction (compact and verbose) o trees: graphml trees and networks, various edge formats and rootings Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources

22 Nexml parsers and writers: mesquite (java NeXML class libraries) Bio::Phylo (BioPerl compatible) pyNexml (python) DAMBE (Visual Basic) NCL (C++) JavaScript Current status (2/4) Parsers and writers Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources

23 Semantic annotation (CDAO) using SAWSDL Current status (3/4) Experiments Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources Scalability: Indexed files in dbxml Created large files from tolweb, rbcl XInclude with tinyseq xml REST Web services: ToL service validation service nexml2json, nexus2xml Schema inclusion in wsdl

24 Publish standard More restricted vocabulary attachments (e.g. Darwin core, CDAO- mediated terms) Substitution model descriptions Sets (in progress, using class identifiers) Distances Splits Current status (4/4) To do Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources

25 NeXML Base URL: http://www.nexml.org Wiki: /wiki Mailing list: /mail Issue tracker: /tracker SVN repository: /code EvoInfo: http://evoinfo.nescent.org CDAO: http://www.evolutionaryontology.org Introduction The problem EvoInfo interests This subproject Nexus issues Parsing Extensibility XML goodies Design Principles Re-use Patterns Inheritance References Implementation Approach ERD Inheritance Anatomy Characters Trees Current status Schema blocks Parsers & writers Experiments To do Resources

26

27 Acknowledgements Contributions: Jason Caravas, Mark Holder, Peter Midford, Jeet Sukumaran, Xuhua Xia Feedback: wg-evoinfo, pPOD, Wayne Maddison, David Maddison Additional funding, support: NESCent, GSoC


Download ppt "NeXML A future data exchange standard for phylogenetics Rutger Vos University of British Columbia."

Similar presentations


Ads by Google