Presentation is loading. Please wait.

Presentation is loading. Please wait.

MathML and Digital Libraries

Similar presentations


Presentation on theme: "MathML and Digital Libraries"— Presentation transcript:

1 MathML and Digital Libraries
Timothy W. Cole University of Illinois at Urbana-Champaign Workshop on International Scientific Data, Standards & Digital JCDL 2005, Denver, CO

2 University of Illinois at UC
Digital Typography? Typography, when considered “as the servant of mathematics,” should have as a goal “to communicate mathematics effectively by making it possible to publish mathematical papers and books of high quality, without excessive cost.” -- Donald Knuth (inventor of TEX), 1978 Where do we go from TEX? 11 June 2005 University of Illinois at UC

3 University of Illinois at UC
MathML Antecedents HTML Math (Dave Raggett, 1994; HTML 3.0/3.2) Limited functionality Community preference to keep HTML simple, generic SGML ISO (1994) Mathematics DTD Exclusively presentational Not as sophisticated as MathML W3C Math Working Group Formed 1997 MathML 1.0 released 1998 MathML 2.0 released 2001 11 June 2005 University of Illinois at UC

4 MathML – Presentation Markup
Used to describe the layout structure of mathematical notation – focus on visual constructs (about 30 elements). Includes: Token elements – primarily PCData content models Layout Schemata, for building expressions – element content models Semantic hints via invisible characters are encouraged <math xmlns=' <mrow> <msqrt> <mi>z</mi> </msqrt> <mo>=</mo> <msup> <mi>z</mi> <mfrac><mn>1</mn><mn>2</mn></mfrac> </msup> </mrow> </math> 11 June 2005 University of Illinois at UC

5 MathML – Content (Semantic) Markup
Intended to support encoding of the underlying mathematical structure of an expression, rather than any particular rendering for the expression – about 120 elements Limited scope, designed to express “commonplace mathematical constructs,” i.e., through about first 2 years of college (calculus) can be extended,e.g., through the use of OpenMath <math xmlns=' <apply><eq/> <apply><root/> <degree><cn type='integer'>2</cn></degree><ci>z</ci></apply> <apply><power/> <ci>z</ci><cn type='rational'>1<sep/>2</cn></apply> </apply> </math> 11 June 2005 University of Illinois at UC

6 Displaying MathML in Web Browsers
Mozilla / Netscape 7.x renders MathML natively But not all elements supported, or supported well Internet Explorer does not render MathML natively But plug-ins available – TechExplorer, MathPlayer Syntax for using MathML different between browsers W3C provides XSLT that allows a page to viewable in both Rendering quality also depends on fonts available Thousands of code points have been added to Unicode STIXFonts.org is creating more than 4,000 freely available new glyphs for these Unicode points (rest were pre-existing) 11 June 2005 University of Illinois at UC

7 Other Software Support for MathML (2004)
Editors: MathType + Word, Integre MathML Editor, SciWriter, MathFlow, Scientific Workplace, MathFlow + Epic/XMetaL, OpenOffice, others Interactive Components: Techexplorer, WebEQ TeX Converters: TeXML, Hermes, tex4ht, others Computation: Maple, Mathematica, REDUCE, others 11 June 2005 University of Illinois at UC

8 University of Illinois at UC
Using MathML (as of 2004) MathML incorporated into a number of document types: Docbook, NIH Journal Archiving & Publishing, etc. MathML is being used in STM publications : In production: American Institute of Physics, US Patent Office In pilot: Springer, Wiley, Marcel Dekker, Elsevier, others MathML behind the scenes in e-Learning software: Blackboard, WebCT, eCollege, Diploma.EDU, MapleTA, other BUT adoption by author community limited to date 11 June 2005 University of Illinois at UC

9 Current Status & Open Issues
MathML 2.0 is stable, available, & in increasing use Proving useful for many publication & educational applications Tailored for use on the Web; interactivity demostrated Commitment to stay open & maintain backward compatibility Limitations, Issues, & Needs Utility for research mathematics? Use in / cross-walks with other Sci-Tech MLs Low-level: fonts / glyphs, character entity definitions, … Semantic utility not yet proven (being investigated): - Comparing MathML expressions for degree of equivalence - Searching by MathML expressions 11 June 2005 University of Illinois at UC

10 An Example: Wolfram Functions Website

11 MathML in Functions Site

12 University of Illinois at UC
But… Rendering of MathML typically not exactly the same application to application For most complex functions, MathML not enough – must augment with annotations meaningful to Mathematica Applications still rely on something other than MathML for complex, behind the scenes functionality No canonical forms – no accepted normalization rules 11 June 2005 University of Illinois at UC

13 Experiment: Using MathML to Link Literature
Hypotheses: Readers of a journal article may want to know more about mathematical functions used in article … And vice-versa Functions site keywords/phrases & MathML found in body of articles may be useful for linking, search & discovery Approach: Examine journal article testbed for presence of Functions site keywords; add links & terminology to article metadata Extract selected MathML occurrences from article body; “normalize”; add to article metadata 11 June 2005 University of Illinois at UC

14 Article Testbed: Illinois DLI-I / D-Lib Test Suite
Funded under DLI-I (NSF, DARPA, & NASA) Continued under CNRI’s D-Lib Test Suite Featured: Multi-publisher, Markup-Based Full-Text Journal Testbed. Studies of Processing, Indexing, Normalization, Retrieval, Rendering, Linking & End-User Searching Needs. Testbed contains 75,000+ Articles from 60 Journal Titles Received as SGML (various DTDs); converted to XML Content from AIP, APS, ASCE, IEE, ASM, ACM, Elsevier Additional support from IEEE, NRL, NTT Learning Systems 11 June 2005 University of Illinois at UC

15 University of Illinois at UC
Searching for Occurrences of Functions Site Metadata Terms in Article Testbed Functions Site: 83,000 pages containing keywords in HTML <meta> elements Approximately 7,500 unique keywords Not all keywords useful for full-text search Eliminated numerals, Mathematica expressions, single word phrases, phrases containing selected words About 1,000 keywords left after automated filtering 367 of these keywords appear in journal article testbed 44,000 articles contain at least one keyword On average each of those 44,000 contain 2 keywords 11 June 2005 University of Illinois at UC

16 Issues in Using Functions Keywords/phrases for Characterizing Articles
Vocabulary switching Keywords in mathematics, not necessarily terms used in physics, electrical engineering, & computer science Meaning of words in full-text less precise Synonyms, less specific use, etc. Context – same descriptive keywords may map to many different branches of Functions site E.g., 300 occurrences of “q-series” in Functions site; which are relevant to 18 articles containing this keyword? Use hierarchical tree of Functions Site to place user in reasonable proximity 11 June 2005 University of Illinois at UC

17 Using MathML Found in Article Body
2.5 million occurrences of MathML in testbed, but most not useful for searching & linking MathML used for greek characters in text Many instances short, inline fragments Quality – many instances can’t be parsed by Mathematica Criteria Explicitly equations (block display format) Minimum length, 200 bytes Accepted by Mathematica kernel About 3% of occurrences meet criteria 11 June 2005 University of Illinois at UC

18 Observations from this early experiment
Currently, text cues in articles seem more effective for linking than mathematical cues Mathematics embedded in articles needs lots of processing to begin to be useful for discovery, linking, … We need to understand mathematical equivalents of language grammar, noun-phrases, … 11 June 2005 University of Illinois at UC

19 Searching Mathematics Directly

20 Search Results

21 More Like This One

22 University of Illinois at UC
Links Wolfram Functions Website W3C Math Working Group Design Science Enhancing the Searching of Mathematics (IMA Hot Topic) 11 June 2005 University of Illinois at UC


Download ppt "MathML and Digital Libraries"

Similar presentations


Ads by Google