XSL Formating Object Pagination of Documents
Generating PDF XML document XSLT Engine XSLT Style sheet XSL-FO Document XSL-FO Formatter Printable document Transformer produces a XSL-FO document Document is run through FOP
Generating PDF XML document XSLT Engine XSLT Style sheet XSL-FO Document XSL-FO Formatter Printable document
XSLT Stylesheet (t31) <xsl:stylesheet xmlns:xsl=" [1] xmlns:fo=" [2] version="1.0"> [3].... [4] Other templates go here. [5]
XSLT Stylesheet Explanation In [1] and [2] are the namespaces, respectively, of the XSLT and FO content in this document, which differentiates transformation requests from output content. If the XSLT engine sees content in the FO namespace, it simply writes it to the output. [3] says that the output document must be valid XML, which is just what an XSL-FO document is, an XML document (not needed). [4] is the root template, which fires first, hence this is the point where to add the essential outline content [5] is the point to add useful processing.
XSL-FO Document [1] [2] [3] [4] content … [5]
XSL-FO Document explanation [1] In order to layout content on a page, the formatter needs to know what size is used. [2] The layout-master-set contains the simple-page-master which contains information e.g. whether you use a European A4 or an US-letter page size. It also contains the region-body element, which is the main body of the page layout. [3] To support complex pagination, the page-sequence element is used. For a simple page layout, very little content is required here, other than to refer back to a particular page definition (the simple-page-master). [4] Within the page-sequence element is a flow element identifying which region of the page to pour the text into is the rationale for the xsl-region-body. This differentiates the body of the page from the outer areas (margins, header...) of the page. [5] Some content flow which is a child of the main flow and has the form of content which defines a block of text (rectangular in shape, taking a list of defaults for everything) which will be placed as the first item on the page..
XSL-FO Structure
XSL-FO Formating Objects in XSL-FO document specify the order in which content is to be placed on pages. Formating Properties are attributes on the individual formating object elements
Formatting objects Formatting objects describe the layout of the document in detail There are 56 XSLFormatting object elements for example a block, character, table cell, etc. FO elements most signify various kinds of rectangular areas formatting features like the width of a border, font size, etc. (attributes for FO elements) Part of XSL XSLT - transformations FO - formatting objects (FO-DTD)
Region tree In XSL, one creates a tree of formatting objects that serve as inputs or specifications to a formatter The formatter generates an ordered tree, the region tree, which describes a geometric structuring of the output medium. The terms child, sibling, parent, descendant, and ancestor refer to this tree structure. The tree has a root node Each region tree node other than the root is called an region and is associated to a rectangular portion of the output medium
Formating Object Tree The root node must be an fo:root formatting object. The children of the fo:root formatting object are a single fo:layout-master-set, an optional fo:declarations, and a sequence of one or more fo:page-sequences. The fo:layout-master-set defines the geometry and sequencing of the pages. The fo:declarations object is a wrapper for formatting objects whose content is to be used as a resource to the formatting process. Children of the fo:page- sequences, which are called flows, provide the content that is distributed into the pages.
template ”/” /template Tree Structure root /root layout-master-set /layout-master-set simple-page-master master-name=”Cover” /simple-page-master simple-page-master master-name=”A4first” /simple-page-master page-sequence-master master-name="contents" repeatable-page-master-alternatives conditional-page-master-reference page-position = "first" master-reference=A4first” conditional-page-master-reference page-position = "rest" master-reference="A4" page-sequence master-reference=”Cover” flow block /block /flow ….. page-sequence master-reference=”content” …………….. simple-page-master master-name=”A4” /simple-page-master
Page Layout page layouts are called page masters fo:layout-master-set contains one or more fo:simple-page-masters elements that define master pages The fo:simple-page-master element has attributes: master-name : the name by which page sequences will reference this master page page-height : the height of the page page-width : the width of the page: the margin-bottom, margin-left, margin-right, and margin-top attributes, or the shorthand margin attribute the writing-mode attribute that determines which direction text flows on the page, for example, left-to-right or right-to-left or top-to-bottom the reference-orientation attribute that specifies in 90- degree increments and how much the content is rotated
Page Layout and Region
Regions Highest level container in XSL-FO are regions including 1.fo:region-body 2. fo:region-before 3. fo:region-after 4.fo:region-start 5.fo-region-end 6.Margin-top 7.Margin-bottom. A book has e.g. header, main body of the page and footer. Margin bottom Margin top
Blocks Block area presents a block-level element, such as a paragraph or a list item. Although block areas may contain other block areas, there should always be a line break before the start and after the end of each block area. A block area, rather than being precisely positioned by coordinates, is placed sequentially in the area that contains it. As other block areas are added and deleted before it, the block area’s position shifts as necessary to make room. A block area may contain parsed character data, inline areas, line areas, and other block areas. Formatting objects that produce block areas include fo:block fo:table-and-caption fo:list-block.
Line Areas A line area represents a line of text inside a block. For example, each of the lines in this list item is a line area. Line areas can contain inline areas and inline spaces. There are no formatting objects that correspond to line areas. Instead, the formatting engine calculates the line areas as it decides how to wrap lines inside block areas.
Inline Areas Inline areas are parts of a line such as a single character, a footnote reference, or a mathematical equation. Inline areas can contain other inline areas and raw text. Formatting objects that produce inline areas include 1.fo:character 2.fo:external-graphic 3.fo:inline 4.fo:instream-foreign-object 5.fo:leader 6.fo:page-number. Some examples: putting the first line of a paragraph into small-caps turning a normally inline formatting object, fo:external-graphic, into a block by "wrapping" with an fo:block formatting object formatting a running footer containing the word "Page" followed by a page number.
Area Formatting Properties Area border: border-color border-width … Area background: background-color background-image … Area marginals: margin-bottom space-after … Area inner marginals: padding-before padding-after … Writing direction Scrolling … There are more than 200 different formatting properties. Most properties can be applied to more than one kind of formatting object element. For example, you use identical code to format a fo:title in 14-point Times bold as you do to format a fo:block in 14-point Times bold.
Formatting objects summary
Example tree flow page-sequence fo:layout-master-set root The root of the document is fo:root. This element contains a fo:layout- master-set and a fo:page-sequence. The fo:simple-master describes a kind of page on which content will be places. Content is placed on copies of the master page using a fo:page- sequence. The fo:page- sequence has a master- reference attribute naming the master page to be used. Its fo:flow child element holds the actual content to be placed on the pages. simple-page-master
simple-page-master: describes a kind of page on which content will be places page-sequence: content is placed on copies of the master page flow: holds the actual content to be placed on the pages. <xsl:stylesheet version="1.0" xmlns:xsl=" xmlns:fo=" root: root of the document contains a layout-master-set and a page-sequence. block: The fo:block formatting object is commonly used for formatting paragraphs, titles, headlines, figure and table captions, etc. Example Fos (lab 32)
Example pdf All with default settings!
Layout-master-set fo:root must be the root node fo:layout-master-set is a wrapper around all masters used in the document the regions on the page are analogous to "frames" in an HTML document: region body and “margins”...
template ”/” /template Tree Structure root /root layout-master-set /layout-master-set simple-page-master master-name=”Cover” /simple-page-master simple-page-master master-name=”A4first” /simple-page-master page-sequence-master master-name="contents" repeatable-page-master-alternatives conditional-page-master-reference page-position = "first" master-reference=A4first” conditional-page-master-reference page-position = "rest" master-reference="A4" page-sequence master-reference=”Cover” flow block /block /flow ….. page-sequence master-reference=”content” …………….. simple-page-master master-name=”A4” /simple-page-master
Region Elements Defines the page type and its layout (cover page, table of contents, body text, etc.)
Example Region <fo:simple-page-master master-name="A4" page-height="29.7cm" page-width="21.0cm" margin-top="1.5cm" margin-bottom="1.5cm" margin-left="1.5cm" margin-right="1.5cm"> <fo:region-body text-align="right" margin-top="20mm" margin-bottom="25mm"/> The fo:simple-page-master is used in the generation of pages and specifies the geometry of the page. The page may be subdivided into up to five regions: region-body, region-before, region-after, region-start, and region-end. The fo:region-before and fo:region-after elements each have an extent attribute that gives the height of these regions. The fo:region-body does not have an extent attribute. Instead, the size of the body is everything inside the page margins. start, before, end and after INCLUDE IN THE BODY
Example Page / Region Layout in pdf page: margin-top="1.5cm" region-body: margin-top="20mm" region-before: extent="20mm”
Sequence In addition to a fo:layout-master-set, each formatting object document contains one or more fo:page-sequence elements. Each page in the sequence has an associated page master that defines how the page will look. Which page master this is, is determined by the master-reference attribute of the fo:page-sequence element. This must match the name of a page master in the fo:layout- master-set. For instance, you could have one master page for the first page of each chapter, a different one for all the subsequent left-hand pages, and a third for all the subsequent right-hand pages. Or, there could be one simple page master for a table of contents, another for body text, and a third for the index. In this case, you use one page sequence each for the table of contents, the body text, and the index. Each page sequence contains three child elements in this order: fo:title element 1. An optional fo:title element containing inline content that can be used as the title of the document. This would normally be placed in the title bar of the browser window like the TITLE element in HTML fo:static-content elements 2. Zero or more fo:static-content elements containing text to be placed on every page fo:flow element 3. One fo:flow element containing data to be placed on each page in turn The main difference between a fo:flow and a fo:static-content is that text from the flow isn't placed on more than one page, whereas the static content is.
Flow The fo:flow object holds the actual content, which will be placed on the instances of the master pages. This content is composed of a sequence of fo:block, fo:block-container, fo:table-and-caption, fo:table, and fo:list-block elements The flow-name attribute of the fo:flow, here with the value xsl-region-body, specifies which of the five regions of the page this flow's content will be placed in. The allowed values are: xsl-region-body xsl-region-before xsl-region-before xsl-region-after xsl-region-after xsl-region-start xsl-region-start xsl-region-end xsl-region-end For example, a flow for the header has a flow-name value of xsl-region-before. A flow for the body would have a flow-name of xsl-region-body. There can’t be two flows with the same name in the same page sequence. Thus, each fo:page-sequence can contain at most five fo:flow children, one for each of the five regions on the page.
header at the top of the page and the footer at the bottom of the page are produced by fo:static-content elements Each piece of the content of a fo:static-content element appears on every page. E.g. both the header at the top of the page and the footer at the bottom of the page are produced by fo:static-content elements. fo:static-content elements must appear before all the fo:flow elements. fo:static-content elements have the same attributes and contents as a fo:flow. However, because a fo:static-content cannot break its contents across multiple pages if necessary, it generally has less content than a fo:flow. Static: Header and Footer Institute of Technology Institute of Technology EVITech EVITech
Page numbering The fo:page-sequence element has eight optional attributes that define page numbers for the sequence. These are: initial-page-number force-page-count format letter-value country language grouping-separator grouping-size Page Numbering The initial-page-number attribute gives the number of the first page in this sequence. The most likely value for this attribute is 1, but it could be a larger number if the previous pages are in a different fo:page-sequence or even a different document. <fo:page-sequence master-reference="A4" <fo:page-sequence master-reference="A4" initial-page-number="1" language="en" country="us"> initial-page-number="1" language="en" country="us">
It is common to have more than one master page e.g.: one master page for the first page of each chapter a different one for all the subsequent left-hand pages and a third for all the subsequent right-hand pages. Page sequence masters The fo:page-sequence-master element is a child of the fo:layout-master-set that lists the order in which particular master pages will be instantiated using one or more of these three child elements: fo:single-page-master-reference fo:repeatable-page-master-reference fo:repeatable-page-master-alternatives child fo:conditional-page-master-reference elements that are instantiated based on various conditions The fo:single-page-master-reference and fo:repeatable-page-master-reference elements each have a master-reference attribute that specifies which fo:simple-master-page their pages are based on. The fo:repeatable-page-master-alternatives has child fo:conditional-page-master-reference elements that are instantiated based on various conditions. Each of these child fo:conditional- page-master-reference elements has a master-reference attribute that specifies which fo:simple- master-page to use if its condition is satisfied.
single-page-master-reference The simplest is fo:single-page-master-reference whose master-reference attribute identifies one master page to be instantiated. For example, this fo:layout-master-set contains a fo:page-sequence-master element named contents that says that all text should be placed on a single instance of the master page named A4: <fo:simple-page-master master-name="A4" page-width…. > This page sequence master only allows the creation of a single page. Technically, it's an error if there's more content than can fit on this one page. However, in practice most formatters simply repeat the last page sed until they have enough pages to hold all the content.
repeatable-page-master-reference Usually it is not know in advance exactly how many pages there will be. The fo:repeatable-page-master-reference element allows to specify as many pages as necessary will be used to hold the content, all based on a single master page. The master-reference attribute identifies which master page will be repeated. This page sequence master will use as many copies of the master page named A4 as necessary to hold all the content: </fo:page-sequence-master> For instance, this fo:page-sequence-master generates at most 10 pages per document:
repeatable-page-master-alternatives The fo:repeatable-page-master-alternatives element specifies different master pages for the first, pages, odd, blank, last even and last odd page. E.g. this page sequence master says that the first page should be based on the master page named letter_first but that all subsequent pages should use the master page named letter: <fo:conditional-page-master-reference page-position="first" master-reference="letter_first"/> <fo:conditional-page-master-reference page-position="rest" master-reference="letter"/> If the content overflows the first page, the remainder will be placed on a second page. If it overflows the second page, a third page will be created. As many pages as needed to hold all the content will be constructed.
conditional-page-master-reference Because a fo:repeatable-page-master-alternatives element needs to refer to more than one master page, it can't use a master-reference attribute such as fo:single-page-master-reference and fo:repeatable-page-master-reference. Instead, it has fo:conditional-page-master-reference child elements. Each of these has a master-reference attribute that identifies the master page to instantiate given that condition. The conditions themselves are determined by three attributes: page-position: This attribute can be set to first, last, rest, or any to identify it as applying only to the first page, last page, any page except the first, or any page respectively. odd-or-even: This attribute can be set to odd, even, or any to identify it as applying only to odd pages, only to even pages, or to all pages respectively. blank-or-not-blank: This attribute can be set to blank, not-blank, or any to identify it as applying only to blank pages, only to pages that contain content, or to all pages respectively.
Content The content (as opposed to markup) of an XSL-FO document is mostly text. Non- XML content such as GIF and JPEG images can be included in a fashion similar to the IMG element of HTML. Other forms of XML content, such as MathML and SVG, can be embedded directly inside the XSL-FO document. This content is stored in several kinds of elements including: Block-level formatting objects Inline formatting objects Table formatting objects Out-of-line formatting objects All of these different kinds of elements are descendants of either a fo:flow or a fo:static-content element. They are never placed directly on page masters or page sequences.
Block-level formatting objects Block-level formatting object is drawn as a rectangular area separated by a line break and possibly extra white space from any content that precedes or follows it. Blocks may contain other blocks, in which case the contained blocks are also separated from the containing block by a line break and perhaps extra white space. Block-level formatting objects include: fo:block fo:block-container fo:table-and-caption fo:table fo:list-block The fo:block element is the XSL-FO equivalent of display: block in CSS or DIV in HTML. Blocks may be contained in fo:flow elements, other fo:block elements, and fo:static-content elements. fo:block elements may contain other fo:block elements, other block-level elements such as fo:table and fo:list-block, and inline elements such as fo:inline and fo:page-number. Block-level elements may also contain raw text. The block-level elements generally have attributes for both area properties and text-formatting properties. The text-formatting properties are inherited by any child elements of the block unless overridden.
Inline formatting objects An inline formatting object is also drawn as a rectangular area that may contain text or other inline areas. However, inline areas are most commonly arranged in lines running from left to right. When a line fills up, a new line is started below the previous one. The exact order in which inline elements are placed depends on the writing mode. For example, when working in Hebrew or Arabic, inline elements are first placed on the right and then fill to the left. Inline formatting objects include: fo:bidi-override fo:character fo:external-graphic fo:initial-property-set fo:instream-foreign-object fo:inline fo:inline-container fo:leader fo:page-number fo:page-number-citation
Table formatting objects The table formatting objects are the XSL-FO equivalents of CSS2 table properties. However, tables do work somewhat more naturally in XSL-FO than in CSS. For the most part, an individual table is a block-level object, while the parts of the table aren't really either inline or block level. However, an entire table can be turned into an inline object by wrapping it in a fo:inline-container. There are nine XSL table formatting objects: fo:table-and-caption fo:table fo:table-caption fo:table-column fo:table-header fo:table-footer fo:table-body fo:table-row fo:table-cell The root of a table is either a fo:table or a fo:table-and-caption that contains a fo:table and a fo:caption. The fo:table contains a fo:table-header, fo:table-body, and fo:table-footer. The table body contains fo:table-row elements that are divided up into fo:table-cell elements.
Leaders and Rules A rule is a block-level horizontal line inserted into text similar to the line below the chapter title on the first page of this chapter. The HR element in HTML produces a rule. A leader is a line that extends from the right side of left- aligned text in the middle of a line to the left side of some right-aligned text on the same line. n XSL-FO both leaders and rules are produced by the fo:leader element. This is an inline element that represents a leader, although it can easily serve as a rule by placing it inside a fo:block. <fo:leader leader-length="7.5in" leader-pattern="rule" rule-thickness="2pt" color="green"/>
Graphics The fo:external-graphic element provides the equivalent of an HTML IMG element. That is, it loads an image, probably in a non-XML format, from a URL. fo:external-graphic is always an empty element with no children. The src attribute contains a URI identifying the location of the image to be embedded. For example, consider this standard HTML IMG element: The fo:external-graphic equivalent looks like this: Of course, you can use an absolute URL : Just as with Web browsers and HTML, there's no guarantee that any particular formatting engine recognizes and supports any particular graphic format. Currently, FOP supports GIF, JPEG, and SVG images. PNG is supported if you have Sun's JIMI library installed. More formats may be added in the future. No support for MathML?? fo:external-graphic is an inline element. You can make it a block-level picture simply by wrapping it in a fo:block element like this:
Embedding SVG The fo:instream-foreign-object inserts a graphic element that is described in XML and that is included directly in the XSL-FO document. For example, a fo:instream- foreign-object element might contain an SVG picture. The formatter would render the picture in the finished document. page <svg xmlns=" width="1.5cm" height="1cm">
Formatting Properties By themselves, formatting objects say relatively little about how content is formatted. They merely put content in abstract boxes, which are placed in particular parts of a page. Attributes on the various formatting objects determine how the content in those boxes is styled. As already mentioned, there are more than 200 different formatting properties. Not all properties can be attached to all elements. For instance, there isn’t much point to specifying the font-style of a fo:external-graphic. Most properties, however, can be applied to more than one kind of formatting object element. (The few that can’t, such as src and provisional-label- separation, were discussed above with the formatting objects they apply to.) When a property is common to multiple formatting objects, it shares the same syntax and meaning across the objects. For example, you use identical code to format a fo:title in 14-point Times bold as you do to format a fo:block in 14-point Times bold. Many of the XSL-FO properties are similar to CSS properties.
Language Property The language property specifies the language of the content contained in either a fo:block or a fo:character element. Generally, the value of this property is an ISO 639 language code such as en (English) or la (Latin). It may also be the keyword none or use-document. The latter means to simply use the language of the input as specified by the xml:lang attribute. For example, consider the first verse of Caesar’s Gallic Wars: indirect effect if the formatter selects layout algorithms depending on the language. For instance, the formatter should use different default writing modes for Arabic and English text. Gallia est omnis divisa in partes tres, quarum unam incolunt Belgae, aliam Aquitani, tertiam qui ipsorum lingua Celtae, nostra Galli appellantur Although the language property has no direct effect on formatting, it may have an indirect effect if the formatter selects layout algorithms depending on the language. For instance, the formatter should use different default writing modes for Arabic and English text. This carries over into determination of the start and end regions and the inline and block progression directions.
Paragraph Property Paragraph properties are styles that normally are thought of as applying to an entire block of text in a traditional word processor, although perhaps block-level text properties is a more appropriate name here. For example, indentation is a paragraph property, because you can indent a paragraph, but you can’t indent a single word. break The break properties specify where page breaks are and are not allowed. keep-with- next keep-with-previous keep-together break-before break-after indent The indent properties specify how far lines are indented from the edge of the text. There are four of these: start-indent end-indent text-indent last-line-end-indent hyphenation The hyphenation properties determine where hyphenation is allowed and how it should be used. These properties apply only to soft or “optional” hyphens such as the ones sometimes used to break long words at the end of a line.
Sentence Property Sentence properties apply to groups of characters, that is, a property that makes sense only for more than one letter at a time, such as how much space to place between letters or words. Letter spacing properties Kerning of text is a slippery measure of how much space separates two characters. It’s not an absolute number. Most formatters adjust the space between letters based on local necessity, especially in justified text. Furthermore, high-quality fonts use different amounts of space between different glyphs. However, you can make text looser or tighter overall. The letter-spacing property adds additional space between each pair of glyphs, beyond that provided by the kerning. It’s given as a signed length specifying the desired amount of extra space to add. For example: This is fairly loose text word-spacing The word-spacing property adjusts the amount of space between words. text-align The text-align and text-align-last properties specify how the inline content is horizontally aligned within its box. start : Left-aligned in left-to-right languages like English center : end : Right-aligned in left-to-right scripts justify : Expanded with extra space as necessary to fill out the line left : Align with the left side of the page regardless of the writing direction right : Align with the right side of the page regardless of the writing direction inside : Align with the inside edge of the page; that is, the right edge on the left page of two facing pages or the left edge on the right page of two facing pages …
Character Property Character properties describe the qualities of individual characters. color property The color property sets the foreground color of the contents using the same syntax as the CSS color property. For example, this fo:inline colors the text “Instititu of Technology!” pink: Instititu of Technology! Colors are specified in much the same way as they are in CSS; that is, as hexadecimal triples in the form #RRGGBB or as one of the 16 named colors aqua, black, blue, fuchsia, gray, green, lime, maroon, navy, olive, purple, red, silver, teal, white, and yellow font properties Any object that holds text can have a wide range of font properties including: font-family : A list of font names in order of preference *font-size : A signed length font-size-adjust : The preferred ratio between the x-height and size of a font, font-stretch : The “width” of a font, given as one of the keywords condensed, expanded, extra-condensed, extra- expanded, narrower, normal, semi-condensed, semi- expanded, ultra-condensed, ultra-expanded … font-style : The style of font specified as one of the keywords italic, normal, oblique, reverse-normal, or reverse-oblique font-variant : Either normal or small-caps font-weight : The thickness of the strokes that draw the font, given as one of the keywords 100, 200, 300, 400, 500, 600, 700, 800, 900, bold, bolder, lighter, or normal text properties The text properties apply styles to text that are independent of the font chosen. These include: text-transform text-shadow text-decoration score-spaces This sentence is yellow.
Block’s Margins However, these properties are only here for compatibility with CSS. In general, it's recommended that you use these four properties instead because they fit better in the XSL-FO formatting model: space-before space-after start-indent end-indent The space-before and space-after properties are equivalent to the margin-top and margin-bottom properties respectively. The start- indent property is equivalent to the sum of padding-left, border-left-width, and margin-left. The end-indent property is equivalent to the sum of padding-right, border-right-width, and margin-right.
New page