Using XML, XSLT, and CSS in a Digital Library Markup Transformations SGML to XML Conversions Metadata Schema & Generation Robert Ferrer r-ferrer@uiuc.edu ASIS Annual Meeting 2000
SGML to XML Conversions - Modular 15 November 2000 ASIS Annual Meeting 2000
SGML to XML Conversions - Basic Empty tags <empty> to < ….. /> <?Processing Instruction> to <? ……... ?> CDATA to CDATA sections <![CDATA[ … ]]> Named entities remain unchanged - α <!DOCTYPE ...> refers to XML DTD containing only character entity definitions to Unicode points <!ENTITY alpha “α”> 15 November 2000 ASIS Annual Meeting 2000
SGML to XML Conversions - Linking Attributes to facilitate internal linking <CITEREF REFID="bib5" idli_occurrence=”3” /> External links represented as XLinks <FIG NAME=“F1” xlink:type=“simple” xlink:href=“fig1.jpg” xlink:show=“new” xlink:actuate=“user” /> 15 November 2000 ASIS Annual Meeting 2000
SGML to XML Conversions - Math SGML Math converted to MathML Presentational MathML <math xmlns=“http://www.w3.org/…”> <msubsup> <mrow><mi>α</mi></mrow> <mrow><mi>i</mi></mrow> <mrow><mo>-</mo><mn>2</mn></mrow> </msubsup> </math> ISO 12083 Math <dformula> <g>a</g> <sup>-2</sup> <inf>i</inf> </dformula> Identify & translate mathematical character references Identify & tokenize mathematical content 15 November 2000 ASIS Annual Meeting 2000
SGML to XML Conversions - Math Recognize & transform mathematical markup <xsl:template match=“dformula”> : <xsl:when test="sup or inf"> <xsl:for-each select="child::node()"> <xsl:choose> <xsl:when test="name(self::node())='sup' and name(following sibling::node()[1])='inf'"> <xsl:element name="msubsup” namespace=“http://www.w3.org/…”> <xsl:element name="mrow” namespace=“http://www.w3.org/…”> <xsl:apply-templates select="preceding-sibling::node()[1]"/> </xsl:element> 15 November 2000 ASIS Annual Meeting 2000
SGML to XML Conversions - TeX TeX converted to GIF images <FORM NOTATION="TEX" HIDE="TRUE"> $$ (j_0-a_2')\,{\rm mod}\,P $$</FORM><uie name= “uie1” xlink:type="simple" xlink:href="fig1.gif" xlink:show="new" xlink:actuate="user” /> TeX converted into MathML IBM TechExplorer $$ (j_0-a_2')\,{\rm mod}\,P <math><mo>(</mo><msub> <mrow><mi>j</mi></mrow> <mrow><mn>0</mn></mrow> </msub><mi>−</mi> <msubsup><mrow><mi>a</mi> </mrow><mrow><mn>2</mn>….. 15 November 2000 ASIS Annual Meeting 2000
SGML to XML Conversions - DTD XML DTD does not permit inclusions and exclusions SGML:<!ELEMENT Article - - (front, body) +(%i.float;)> XML:<!ELEMENT Article (front | body | %i.float;)*> XML DTD does not permit the ‘&’ connector XML DTD does not permit the use of mixed content models <!ELEMENT Other ((author, journal) | (#PCDATA))> 15 November 2000 ASIS Annual Meeting 2000
Metadata - Usage Metadata Within the DLI Testbed Normalize key fields from different publisher DTDs to facilitate searching Provide common and easily displayable intermediate search results Add value in the form of links to cited or citing articles within the Testbed, external abstracts and indexes, etc. 15 November 2000 ASIS Annual Meeting 2000
Metadata - Schema Resource Description Framework (RDF) provides standardized way to represent metadata using XML Encapsulates metadata elements Provides varying levels of granularity RDF container objects describe the relations between repeated metadata elements 15 November 2000 ASIS Annual Meeting 2000
Metadata - Schema Dublin Core (DC) model is used to encapsulate all searchable metadata Provides the semantic framework for describing each object in the collection Content Intellectual Property Instantiation Title Creator Date Subject Publisher Format Description Contributor Identifier Type Rights Language Source Relation Coverage 15 November 2000 ASIS Annual Meeting 2000
Metadata - Schema Extensive custom IDLI tags are included Offer a further level of granularity <DC:Description><idli:Abstract></DC:Description> Search clients familiar with IDLI schema can achieve much greater precision Dublin Core Qualifiers (DCQ) substructure to replace many of the project-specific IDLI elements <DC:Description><DCQ:Abstract></DC:Description> 15 November 2000 ASIS Annual Meeting 2000
Metadata - Schema <rdf:seq> <rdf:li> <dc:Creator> <idli:author_name>Giust, G. K.</idli:author_name> <idli:organization_name>Department of Electrical Engineering, Arizona State University</idli:organization_name> </dc:Creator> </rdf:li> <idli:author_name>Sigmon, T.W.</idli:author_name> <idli:organization_name>Department of Computer Science, Illinois State University </idli:organization_name> </rdf:seq> 15 November 2000 ASIS Annual Meeting 2000
Metadata - Extracting Metadata is extracted from the ‘base’ XML files Utilization of XML Header DTD is used to resolve entities XML-Stylesheet processing instruction Visual Basic application serves as parser Document Object Model (DOM) XSLT Style Sheets 15 November 2000 ASIS Annual Meeting 2000
Metadata - Extracting Utilization of XSLT Style Sheets XSLT transformative features to generate base metadata file and forward citation fragment XSLT scripting features to generate elements not directly expressed in the document XSLT instantiation of ActiveX objects to test for links 15 November 2000 ASIS Annual Meeting 2000
Metadata - Extracting Utilization of DOM Insert pseudo elements (e.g. bibliographic data) Search reference citations from the generated metadata object to insert forward references into other metadata files 15 November 2000 ASIS Annual Meeting 2000