Download presentation
Presentation is loading. Please wait.
Published byLesley Jenkins Modified over 8 years ago
1
Lecture 2.01 Data Modeling I: XML & XSLT Marc Dumontier Blueprint Initiative Samuel Lunenfeld Research Institute Mount Sinai Hospital Toronto, ON
2
Lecture 2.02 Data Modeling I: XML & XSLT Data Modeling Concepts Extensible Markup Language (XML) Extensible Stylesheet Language Transformations (XSLT)
3
Lecture 2.03 Data Modeling Concepts Data modeling involves considering how to represent data objects within a system, both logically and physically. Tools are processes which act on data. Many tools in the bioinformatics field are used in a pipeline to produce meaningful conclusions. example: given a list of GIs, retrieve the Gene Ontology terms, and generate an XML document, which is then transformed for difference presentation such as PDF and HTML using XSLT.
4
Lecture 2.04 File Formats The format of your input data will be the largest factor in the design and architecture of your tool. Complex data must be distributed in a format that is based on a specification or model which sets the constraints and organizational structure. A good file format is self-documenting. Parsing structured data Validation of data
5
Lecture 2.05 File Formats (ASN.1)
6
Lecture 2.06 File Formats (GO Flat File)
7
Lecture 2.07 File Formats (XML)
8
Lecture 2.08 Data Modeling I: XML & XSLT Data Modeling Concepts Extensible Markup Language (XML) Extensible Stylesheet Language Transformations (XSLT)
9
Lecture 2.09 Extensible Markup Language XML is a framework for defining markup languages Based on Standard Generalized Markup Language XML makes data portable Human readable Unicode – character encoding (16-bit characters) XML Schema/Document Type Definition (DTD) XML Namespaces Programmable interfaces: SAX/DOM XPath
10
Lecture 2.010 Extensible Markup Language
11
Lecture 2.011 Well Formed Document contains only one root element. Document contains one or more elements. Every start tag must have a corresponding end tag. Tags should be properly nested. Attribute values must be enclosed in quotes. Tag names must be valid XML names.
12
Lecture 2.012 Well Formed
13
Lecture 2.013 Well Formed
14
Lecture 2.014 Well Formed
15
Lecture 2.015 Well Formed
16
Lecture 2.016 Well Formed
17
Lecture 2.017 Definitions Tag: The words between are XML tags Element: The information from the start of a start tag to the end of an end tag Bioinformatics is fun Attributes: name/value pairs which are in tags PCDATA: Parsed Character Data This is some PCDATA
18
Lecture 2.018 Elements Naming rules –Start with letter or underscore –After the first character, numbers are allowed as well as “.” and “-” –Names can’t contain spaces –Names can’t contain “:” (XML Namespaces) –Names can’t start with the letters XML (any case)
19
Lecture 2.019 White Space Includes the space character, tabs, new lines (carriage return and line feed). No white space stripping (unlike HTML) Extraneous white space
20
Lecture 2.020 Comments Comments are nodes too!
21
Lecture 2.021 Processing Instructions Processing Instructions are used to provide information to an application. These are not a necessary part of an XML document, but the XML processor is required to pass them to an application. (can contain attributes)
22
Lecture 2.022 XML Namespaces An XML namespace is a collection of names, identified by a URI reference, which are used in XML documents as element and attribute names Avoid element name collisions. Allows for the combination of vocabularies into single XML documents. * URI: Uniform Resource Identifier
23
Lecture 2.023 XML Namespaces Define a namespace prefix and point it to a URI
24
Lecture 2.024 Entity References Can be thought of as a variable. Used to represent a special characters in text nodes. –< less-than sign –> greater-than sign –& ampersand –" quote (double) –' quote (single) Can even define your own!
25
Lecture 2.025 Character References Character references allow authors to enter any unicode character by their code positions. This is mostly useful for specifying international characters, as well as non-visible characters. means code position 160 (space). Unicode positions are the same as ISO-8859-1 (ISO Latin 1). You can use hexadecimals by prefixing position with an ‘x’.
26
Lecture 2.026 XML Authoring Software XMLSpy (http://www.altova.com)
27
Lecture 2.027 Software
28
Lecture 2.028 Extensible Markup Language Questions?
29
Lecture 2.029 Data Modeling I: XML & XSLT Data Modeling Concepts Extensible Markup Language (XML) Extensible Stylesheet Language Transformations (XSLT)
30
Lecture 2.030 Extensible Stylesheet Language http://www.w3.org/Style/XSL/ XSL is a family of recommendations for defining XML document transformation and presentation. It consists of three parts: –XSL Transformations (XSLT) –XML Path Language (XPATH) –XML Formatting Objects (XSL-FO)
31
Lecture 2.031 XSLT XSL XSLTXPathXSL-FO
32
Lecture 2.032 XSLT A language for transforming XML Allows for the extraction, simplification, and reorganization of XML documents without writing programs which use SAX or DOM. Transformations are defined by stylesheets. Transforms one XML document to another XML document, or other type of documents such as RTF,PDF (using XSL-FO), flat file, etc.
33
Lecture 2.033 XSLT Processors Transformations are generally run in 3 ways –Standalone XSLT processor –A client program such as IE/Netscape –Server-side such as a Java servlet, JSP page, or a PERL CGI. Xalan Java is an implementation of the W3C recommendation for XSLT and Xpath. http://xml.apache.org Java –jar xalan.jar –IN test.xml –XSL test.xsl
34
Lecture 2.034 Trees and Nodes In XSLT, an XML document is used in terms of a tree data structure which consists of nodes. There are different types of nodes such as elements, attributes, comments, processing instructions, namespaces, and text. Root node vs. root element.
35
Lecture 2.035 Trees and Nodes
36
Lecture 2.036 Creating a stylesheet A stylesheet is an XML document whose root element is “stylesheet” from the XSL namespace. A stylesheet contains zero or more template elements. Each template element is a rule which specifies a transformation. the “apply-templates” element processes all of the children of the current node, including text nodes
37
Lecture 2.037 Creating a stylesheet
38
Lecture 2.038 Creating a stylesheet
39
Lecture 2.039 Creating a stylesheet
40
Lecture 2.040 Creating a stylesheet
41
Lecture 2.041 Templates Accessing node values is done through the use of the element. To insert spaces, and output literals, you can use is for shallow copy is for deep copy
42
Lecture 2.042 Match Patterns Match patterns are a subset of the XPath language. Match patterns have 3 parts: –Pattern Axis –Node test –Predicate Can be used in,, and “/” matches the root node “*” matches all element nodes
43
Lecture 2.043 Match Patterns “Protein” matches all elements. “Protein/GI” matches all elements that are children of a element. “//Protein” matches all elements at any depth. “.” matches the current node.
44
Lecture 2.044 Match Patterns – Pattern Axes A subset of XPath. There are 2 available in match patterns (XPath supports 13) –child –attribute Syntax of a step in a match pattern is: axis::node test [predicate] Default axis is the child axis Shortcut for attribute axis is “@”
45
Lecture 2.045 Match Patterns – Pattern Axes
46
Lecture 2.046 Match Patterns – Node Test Use the name of the node or the wild card “*” to select element nodes as well as node types. comment() – comment node node() – any type of node processing-instruction() – processing instruction text() – text node
47
Lecture 2.047 Match Patterns – Node Test
48
Lecture 2.048 Match Patterns – Node Test Found an element:
49
Lecture 2.049 Match Patterns – Node Test Found an element: Protein-list Found an element: Protein Found an element: Accession Found an element: GI Found an element: Taxon Found an element: Protein Found an element: Accession Found an element: GI Found an element: Taxon
50
Lecture 2.050 Match Patterns – Predicates Predicates contain XPath expressions, enclosed in the [ ] operator. Tests whether a condition is true. –Test the value of a text node or attribute –Test the existance of a child element –Test the position of a node in the node tree. Boolean conditions using the or operator “|”
51
Lecture 2.051 Match Patterns – Predicates The Last Protein is
52
Lecture 2.052 Match Patterns – Predicates The Last Protein is NP_002077
53
Lecture 2.053 Conditional Processing Create a test using and make processing conditional on that test. There is no “else” in XSLT. To make an if/else statement, use,, and
54
Lecture 2.054 Conditional Processing
55
Lecture 2.055 Conditional Processing
56
Lecture 2.056 Conditional Processing To loop through a node set, use The expression in the select attribute must evaluate to a node-set
57
Lecture 2.057 Conditional Processing
58
Lecture 2.058 XPath XML Path Language is a language for addressing parts of an XML document, designed to be used by both XSLT and XPointer. The language mainly consists of location paths and expressions Specification available at http://www.w3.org/TR/xpath
59
Lecture 2.059 XPath - Datatypes There are four data types. –Node Sets –Booleans –Numbers –Strings
60
Lecture 2.060 XPath – Node Sets Can be zero, one, or more nodes. XPath expressions that return node sets are called location paths. Functions which operate on Node Sets are –count(node-set) –last() –local-name(node-set) –name(node-set) –position()
61
Lecture 2.061 XPath – Node Sets
62
Lecture 2.062 XPath - Numbers Numbers are stored in double floating point format. The following operators can be used with numbers: +, -, *, div, mod Functions which operate on Numbers are: –ceiling() –floor() –round() –sum()
63
Lecture 2.063 XPath - Numbers
64
Lecture 2.064 XPath - Strings String support is not extensive in XSLT. Functions which operate on Strings are: –concat(string string1, string string2, …) –contains(string string1, string string2) –starts-with(string string1,string string2) –string-length(string string1) –substring(string string1, number offset, number length) …
65
Lecture 2.065 XPath - Strings
66
Lecture 2.066 XPath - Booleans Evaluate to either true or false. Numbers are false if equal to zero. Strings are false if empty. Node Sets are false if empty. Operators are !=,, >= Boolean operators ‘and’, ‘or’, ‘not’
67
Lecture 2.067 XPath - Booleans
68
Lecture 2.068 Creating XPath Location Paths A Location Path is a set of Location Steps. Each location step is separated by either ‘/’ or ‘//’. To Start from the root element, start the Location Path with ‘/’ (absolute path). Otherwise, The Location Path starts from the context node (relative path). A Location Step is made up of an axis, node test, and zero or more predicates.
69
Lecture 2.069 Creating XPath Location Paths Examples –Protein-list –Protein-list/*/Accession –/Protein-list/Protein/Taxon/[@taxid = 9606] –/Protein-list/Protein[position() = 1]/GI –//GI –.. –Protein[Accession] –Protein-list//Taxon/[@taxid != 0]
70
Lecture 2.070 XPath Location Steps: Axes ancestor – all parents of the context node. ancestor-or-self. attribute – attributes of context node. child – children of context node. descendent – all children,grand-children, etc of context node. descendent-or-self. following – every node in the document which come after context node (document order).
71
Lecture 2.071 XPath Location Steps: Axes following-sibling – all siblings of context node that are following (sibling = same depth). namespace – namespace nodes or context node. parent – parent of the context node. preceding – all nodes that came before the context node (document order) preceding-sibling – all siblings of context node that came before (document order). self – context node.
72
Lecture 2.072 XPath Location Steps: Axes Examples –child::GI –descendant::Accession –/child::Protein- list/child::Protein/child::Taxon/attribute::taxid –ancestor::node()
73
Lecture 2.073 XPath Location Steps: Node Tests Use the names of nodes or wildcard ‘*’. Can select based on type of node –comment() –node() –processing-instruction() –text()
74
Lecture 2.074 XPath Location Steps: Node Tests
75
Lecture 2.075 XPath Location Steps: Predicates A conditions which evaluates to true or false. Short cut: [3] = [position() = 3] Examples –Test for the existence of nodes. –Test for the value of strings in attributes or text nodes. –Test for the position of that node.
76
Lecture 2.076 Named Templates A template can be given a name, and then explicitly called using. Parameters can be passed to templates this way much like functions are called in programming languages. Pass parameters using. Declare variables in templates using.
77
Lecture 2.077 Named Templates
78
Lecture 2.078 Variables Variables are used to store values. Any XPath data type can be stored. The value cannot be changed once set. To access a variable from Xpath, prefix variable name with “$” To create a variable, use Can be used as a top-level element (global scope)or inside a template body (local scope)
79
Lecture 2.079 Variables
80
Lecture 2.080 XSLT Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.